Published: 20/04/2025
7 minute read

Level up your Ansible Code - Creating Golden Images

TL;DR: The combination of Ansible and Packer can be a terrific solution to improve provisioning time, increase reliability and reduce stress on other infrastructure. Run Packer once with your existing Ansible-Config and reuse the resulting image a 1000 times.

So the story usually looks like this: You became tired of doing the same web-server base configuration over and over again. Sometimes it didn’t work because you misspelled a parameter or copy-pasted the wrong command. But then someone mentioned Ansible. A tool that can run the same actions over and over again with the desired same output (when done right - which I have seen not that often). It can be lunched against 1000s of machines. It sounded great since you wanted to have more free time and are not afraid to automate yourself out of your job (don’t worry, will not happen - you will only level up).
So you started to implement your tasks as Ansible roles and tasks. It took a while to get right but now you have made yourself a name by always delivering setups in such a quick time with rock solid and consistent quality. Great, enjoy your life, right?

Yeah, maybe. But you seek even better outputs, even more “free” time, got a new requirement, or just want a good reason for your next raise? I don’t really care, but this is where this canticle comes in handy!

My Ansible setup

My story actually looked quite similar. I use Ansible to set up vanilla k8s cluster on systems. I have a common role for all nodes and a cp-bootstrap/worker-join role. Setting up a Debian based cluster, with a lot of extra conf for extra container-runtimes and newer kernels it takes about 15min to fully bootstrap a cluster on the Hetzner cloud. That is so slow.

The annoying stuff in Ansible

Manly because Ansible is slow. Even with connection-pipelining, multiple forks and other tweaks. It gets even worse if the connection goes over VPS,JumpHosts or some Proxies. But not only Ansible is slow. Installing packages is slow, setting kernel parameters and downloading stuff is slow. And sometimes, not often, your playbooks FAIL. Because a connection timed out, you got rate limited or a package version changed.

When this happens, your playbooks are not idempotent, let alone reproducible. It mostly works out, but you can not guarantee that the state will always be exactly the same when you run your playbook over and over again. I get it, reproducibility is hard, but we can make your life’s a little easier when we don’t have to run Ansible that often. And even save more time, because again, Ansible is slow.

Introducing Packer & Golden Images

Packer is a packaging software from HashiCorp. It uses plugins to integrate with various providers, such as AWS, Azure, GCE but also onPrem solutions like VMware, Proxmox and good old QEMU. It uses HashiCorp’s Configuration Language (HCL), just like Terraform.
In Packer you usually have one or more source resources with one or more build steps. Packers plugin ecosystem makes it really versatile for your build configuration. You can create AMIs on AWS, but also Docker-Images, ISOs and RAW disk images. Reusability is provided by HCL by the usage of variables, inputs and basic logic functions (JSON-Encode, reverse, map, etc.) build into HCL.
With Packer you can archive real reproducibility. You create a working image once and reuse it a 1000 times. All machines created from this image will have the same, known good base configuration with the same packages installed. Perfect for providing ready-to-go hardened servers. The resulting images are called golden-images.
And now we will use our existing configuration automation code for Ansible with the Packers plugin system to cut down provisioning time for our k8s clusters.

Merging Packer & Ansible

Let’s work through reusing my existing Ansible playbooks to create golden images by using an concrete example. I want to cut down provisioning time for k8s clusters from 15 minutes to 5 minutes.
For this I will use golden images created by Packer with the the Hetzner hcloud Packer plugin. I need a source and some basic inputs to make it configurable:

 1# hcloud.pkr.hcl
 2packer {
 3  required_plugins {
 4    hcloud = {
 5      source  = "github.com/hetznercloud/hcloud"
 6      version = ">= 1.6.0"
 7    }
 8  }
 9}
10
11variable "base_image" {
12  type    = string
13  default = "debian-12"
14}
15variable "k8s_version" {
16  type    = string
17  default = "1.32.3"
18}
19variable "user_data_path" {
20  type    = string
21  default = "cloud-init.yml"
22}
23
24locals {
25  output_name = "debian-12-k8s-v${var.k8s_version}"
26}
27
28source "hcloud" "k8s-amd64" {
29  image         = var.base_image
30  location      = "nbg1"
31  server_type   = "cx22"
32  ssh_keys      = []
33  user_data     = file(var.user_data_path)
34  ssh_username  = "root"
35  snapshot_name = "${local.output_name}-amd64"
36  snapshot_labels = {
37    type    = "infra",
38    base    = var.base_image,
39    version = "${var.k8s_version}",
40    name    = "${local.output_name}-amd64"
41    arch    = "amd64"
42  }
43}
44source "hcloud" "k8s-arm64" {
45  image         = var.base_image
46  location      = "nbg1"
47  server_type   = "cax11"
48  ssh_keys      = []
49  user_data     = file(var.user_data_path)
50  ssh_username  = "root"
51  snapshot_name = "${local.output_name}-arm64"
52  snapshot_labels = {
53    type    = "infra",
54    base    = var.base_image,
55    version = "${var.k8s_version}",
56    name    = "${local.output_name}-arm64"
57    arch    = "arm64"
58  }
59}

This allows me to create images (or snapshots for Hetzner) for debian based imaged on amd64 and arm64. The image name will be set based on the input parameters.

Now we need a build step. While Packer has a dedicated Ansible plugin, I choose to not use it, since it did not meet my requirements for handling restarts and other parameters. Instead I choose to use the build-in shell provisioner which just runs some inline commands that are a predefined in a shell script:

 1build {
 2  sources = ["source.hcloud.k8s-amd64", "source.hcloud.k8s-arm64"]
 3
 4  provisioner "shell" {
 5    expect_disconnect = true
 6    env = {
 7      k8s_version = "${var.k8s_version}"
 8    }
 9    scripts = [
10      "ansible-setup.sh",
11    ]
12  }
13  provisioner "shell" {
14    pause_before = "30s"
15    max_retries = 1
16    env = {
17      k8s_version = "${var.k8s_version}"
18    }
19    scripts = [
20      "ansible-setup.sh",
21    ]
22  }
23}

This runs the same shell script two times, which is exactly what I need because my IaC code includes a restart to swap the current kernel version. All what my shell script really does is to wait for cloud-init to finish, clone my Ansible git repo and run it with a bunch of predefined vars. Finally it performs some cleanups and resets cloud-init:

 1#!/bin/bash
 2set -e -o pipefail
 3
 4echo "Waiting for cloud-init to finish..."
 5cloud-init status --wait
 6
 7# setup requirements
 8echo "Installing packages..."
 9apt-get update -qq
10apt-get install -qq --yes --no-install-recommends git python3-pip
11pip3 install --user --break-system-packages --no-warn-script-location --no-cache-dir ansible jmespath
12PATH=~/.local/bin:$PATH
13
14if [ ! -d "playbooks" ]; then
15  git clone --depth 1 https://github.com/hegerdes/ansible-playbooks.git playbooks
16fi
17
18# setup ansible play
19echo "Running playbook..."
20cd playbooks
21echo "Vars:"
22cat <<EOF >hostvars.yaml
23k8s_cri: crun
24k8s_containerd_variant: github
25k8s_ensure_min_kernel_version: 6.12.*
26EOF
27
28printenv | sed 's/=/\: /g' | grep k8s >>hostvars.yaml
29cat hostvars.yaml
30
31cat <<EOF >pb_k8s_local.yml
32- name: K8s-ClusterPrep
33  hosts: localhost
34  become: true
35  gather_facts: true
36  roles:
37    - k8s/common
38EOF
39
40# Run playbook
41ansible-playbook pb_k8s_local.yml --extra-vars "@hostvars.yaml" -v && cd

I just run my existing playbooks locally on the target server to archive the desired configuration. While the reproducibility is provided by the resulting image, we should still try to make our playbooks idempotent, so we can run them over and over again. It should not perform any changes when the desired state is already reached.
With this and some additional cleanup code to minimize image size, I get ready to use golden k8s images, which I can use to quickly spin up a new cluster or use as an autoscaling nodes. Every dependency I need is already in that image. Every kernel parameter is already tuned to my desire. I can just use these, and be sure that they will always act the same, since these are truly reproducible.

Screenshot of the Hetzner image Snapshot page, showing two Debian images for amd64 and arm64, each with a size of less then 500MB

The creation of these two images took about 12 minutes. Thats 12 minutes I have to spent once (for every k8s version) and I now save for every deployment. When I now want to create a new k8s cluster, I can just uses these images and be sure they are already configured to my desire. Bootstrapping a new cluster with these images takes actually less then 5 minutes. It saves time and I am much more constable, that my setups are always acting the same, because they are reproducible.

ℹ️ You can find the complete code on my GitHub in my GitOps repo.

Conclusion

Ansible, Packer and even Terraform are great tools that work together and ma a real dream team when put together. Thats one of the reason RedHat bought HashiCorp and now works on putting these even closer together. These tools are not competitors!
When you want to save even more time, you can easily glue your Ansible code together with tools like Packer, to create truly reproducible deployments.

Henrik Gerdes

Tags:

Recent Articles:

Native IPv6 Kubernetes for true edge routing

Gateway API doesn't solve real problems - yet

AWS Web-Identity-Token - The free IDP for all your OnPrem solutions

The Grafana trust problem

Rootless GitLab Runners

Follow Up: Let's talk about anonymous access to Kubernetes

Level up your Ansible Code - Creating Golden Images

Understanding and using modern day authentication frameworks to improve security, productivity and user acceptance

What is new in containerd 2.0

Making OnPrem Kubernetes feel like AKS/EKS/GKE

New Website - Abandon JavaScript Frameworks

Benchmarking what actually drive our containers

Using GitLab to manage Kubernetes access

The recurring problem of the Kubernetes metrics server and insecure Kubelet certificate

Level up your Ansible Code - Creating Golden Images

My Ansible setup

The annoying stuff in Ansible

Introducing Packer & Golden Images

Merging Packer & Ansible

Conclusion