Considerations:

  • This is a very basic Terraform example of only managing a single VM, thus it is an extremely simple example and likely not the way you’re going to use Terraform, but really… that’s only a matter of what you code with Terraform, nothing really fundamentally changes from the principles show below
  • The environment is disconnected, meaning you’re not loading providers and such from the Internet
  • Credentials must be secured, so not available in the project’s repository
  • The project must be generic, and not restricted to a single VM dataset
  • Integration with Gitlab webhooks and managing multiple VMs in a similar (non optimal) Terraform workflow but for a project can eventually be published at a later time.
  • Some familiarity with AWX/APP and Terraform is assumed in this article, for the later this book really fullfills it’s promise of taking you from zero to terraform hero, I hadn’t even looked at a Terraform single line of code until a couple of weeks ago
  • and finally, on a much more personal not, I would prefer to have available a Free Software virtualization provider but alas that’s out of my hands

The Terraform Project

Our project starts with several split terraform files:

  • backend.tf specifying the backend we used, in this case, a PostgreSQL database
  • providers.tf specifying the providers we’re using, in this case, only VMWare
  • variables.tf specifying the three main variables for our example project
  • datasources.tf specifying the datasources (unmanaged resources) available to us

And finally…

  • main.tf specifying how to resource (manage) a VM

Rundown through the project

Many things are not explicitly defined in the Terraform files because quite a bit of the magic comes from Ansible, do try to hold that in mind.

backend.tf

By default, Terraform uses the local backend, meaning local state files, but if you’re using AWX/AAP then you should know you’re using containers that run your Ansible playbooks, so local state files are something you do not want to use.

Also, storing state files in a git repository is also something you should not do.

A simple way to externalize is to use a PostgreSQL database, and all you need to do is tell Terrafrom you’re using PostgreSQL:

1
2
3
terraform {
  backend "pg" { }
}

So simple it almost seems a mistake, no? Where’s the DB? What are the credentials? Keep that in mind, but don’t worry about that yet.

providers.tf

Equally simple is the providers section, requiring very little data unlike what you’d expect at first. We’ll get back to it later.

1
2
provider "vsphere" {
}

variables.tf

In this file we have some magic, on one hand we have a quite “undefined” vm map of strings variable, then we have two oddly specific variables for the network interfaces and the data disks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
variable "vm" {
  description = ""
  type        = map(string)
}
 
variable "networks" {
  description = ""
  type        = list(map(string))
  default     = []
}
 
variable "data_disks" {
  description = ""
  type        = list(object({ label = string, unit_number = number, size = number, thin_provisioned = bool }))
  default     = []
}

By default we may not have any networks (weird, but hey) or data disks (we’ll always have an OS boot disk as you’ll be able to check out in main.tf, later on).

Networks is a list of maps of strings, holding properties like the VMWare network label, the IP address and the CIDR netmask for posterior usage in customization (not covered in this article).

datasources.tf

Since we’re working with a pre-existing and non Terraform managed VMWare setup, many of our available resources are actually data elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
data "vsphere_datacenter" "dc" {
  name = try(var.vm["datacenter"], "OUR DataCenter")
}
 
data "vsphere_compute_cluster" "cluster" {
  name          = var.vm["cluster"]
  datacenter_id = data.vsphere_datacenter.dc.id
}
 
data "vsphere_datastore_cluster" "datastore_cluster" {
  name          = var.vm["datastore_cluster"]
  datacenter_id = data.vsphere_datacenter.dc.id
}
 
data "vsphere_virtual_machine" "template" {
  name          = try(var.vm["template"], "A_VM_TEMPLATE")
  datacenter_id = data.vsphere_datacenter.dc.id
}
 
data "vsphere_network" "network" {
  count = length(var.networks)
  name          = var.networks[count.index].label
  datacenter_id = data.vsphere_datacenter.dc.id
}

Take a careful look at this example, specially to those try blocks (try to use a variable, define it with that default value if unavailable) and the networks block…

The networks block expands to all the networks your VM might need. We’ll come back to this later at the Ansible section of this article.

main.tf

In our main.tf file we’ll simply define a virtualmaching taking in account all our variables, some usage of try in order to impose ommitable values, and so on.

Note that we’re navigating into the network interfaces and data disks variables in order to define which should be present.

We’re not doing any particular customization in this example, it may be expanded in the future. Please note that since we’re using RHEL9 you need to use EFI and Secure Boot for the VMs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
resource "vsphere_virtual_machine" "vm" {
  name              = var.vm["name"]
  num_cpus          = try(var.vm["num_cpus"], 2)
  memory            = 1024 * ( try(var.vm["memory"], 4) )
  guest_id          = data.vsphere_virtual_machine.template.guest_id
  folder            = try(var.vm["folder"], "Testes/")
 
  resource_pool_id     = data.vsphere_compute_cluster.cluster.resource_pool_id
  datastore_cluster_id = data.vsphere_datastore_cluster.datastore_cluster.id
 
  dynamic "network_interface" {
    for_each = data.vsphere_network.network
    content {
      network_id = network_interface.value.id
    }
  }
 
  wait_for_guest_net_timeout = -1
  wait_for_guest_ip_timeout = -1
 
  firmware = try(var.vm["firmware"], "efi")
  efi_secure_boot_enabled = tobool(try(var.vm["secure_boot"], "true"))
 
  disk {
    label = try(var.vm["os_disk_label"], "os")
    thin_provisioned = tobool(try(var.vm["thin_provisioned"], "false"))
    size = try(var.vm["os_disk_size"], data.vsphere_virtual_machine.template.disks.0.size)
  }
 
  dynamic "disk" {
        for_each = var.data_disks
        content {
                label = disk.value.label
                unit_number = disk.value.unit_number
                size = disk.value.size
                thin_provisioned = disk.value.thin_provisioned
        }
  }
 
  clone {
    template_uuid = data.vsphere_virtual_machine.template.id
  }
}

Wait, that’s it?…

No, not really, but that’s all we’re coding with Terraform for this project, we’re now moving into Ansible territory, where the magic happens.

AWX/Ansible Automation Platform

A few good reasons to integrate Terraform with AWX/AAP:

  • security
    • your credentials may come from some vault integration
    • you may be working in a disconnected or very controlled environment and as such you’ll need to have all the providers you need already setup
  • you want to take advantage of the best of both worlds and not have the good of one world along with the less good of it

So here’s some magic… There’s a Terraform collection for Ansible that has many very useful properties, in our example our playbook actually only has one task invking this collection.

The Execution Environment already has Terraform available, as well as the VMWare (actually vSphere) provider already downloaded into /home/runner/.terraform. Because we live in a disconnected environment, that’s why.

So what’s the playbook composed of? Lot’s of magic…

  • credentials can be protected and dynamically loaded from a vault (like Hashicorp’s or CyberArk, or others)
  • remember the backend that was so empty of defintion? The PostgreSQL backend supports some environment variables where you can specify your connection string, like in the example below. We’ve created our own credential type in AWX/AAP to support PostgreSQL connectivity
  • remember the provider that was so empty of defintion? Same as for the backend, we tke advantage of the credential variables
  • with plugin_paths we specify the path to the providers directory in the execution environment, when that variable is defined, cloud.terraform will not try to download the required plugins
  • we prevent accidental destruction of the environment with check_destroy
  • we pass on variables from a yaml dictionary that’s fed into the AWX/AAP job template (with webooks we could loop over changed variable files). Please note the complex_vars property… this is key for achieving success. It means that if you represent in an ansible variable a structure conforming to the Terraform defition (see above in variables.tf), then the collection will magically transform your variables from one format to the other.
  • as we’re using a very tightly controlled environment, we’re using a per-vm unique workspace, this is probably not what you’d use in a normal setting, but this was a test.
  • you should also define no log if you’re in production, as otherwise some quite important password value containing variables might show up in the clear.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
- hosts: localhost
  connection: local
  gather_facts: no

  tasks:
    - name: "Run terraform"
      environment:
        PG_CONN_STR: "postgres://{{ PGUSER }}:{{ PGPASSWORD }}@{{ PGHOST }}/{{ pgdb }}"
        VSPHERE_SERVER: "{{ lookup('env', 'VMWARE_HOST') }}"
        VSPHERE_USER: "{{ lookup('env', 'VMWARE_USER') }}"
        VSPHERE_PASSWORD: "{{ lookup('env', 'VMWARE_PASSWORD') }}"
      cloud.terraform.terraform:
        force_init: true
        plugin_paths:
          - "{{ terraform_dir | default('/home/runner/.terraform/providers') }}"
        check_destroy: true
        provider_upgrade: false
        binary_path: "{{ terraform_path|default('/bin/terraform') }}"
        project_path: "{{ project_dir|default('./') }}"
        complex_vars: true
        variables:
          vm: "{{ vm }}"
          networks: "{{ networks }}"
          data_disks: "{{ data_disks }}"
          dns_servers: "{{ dns_servers }}"
        workspace: "{{ workspace|default(vm.name) }}"
        state: present
      no_log: "{{ env|default('lab') == 'prd' }}"
      register: terraform
      tags:
        - terraform

So now all that’s left is to lauch the job template with some extra vars:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
pgserver: "terraform-db"
pgdb: "terraform-qly"
 
vm:
  name: "serverq2"
  domain: "our.own.domain"
  datacenter: "OUR DataCenter"
  cluster: "OUR_CLUSTER"
  folder: "A_VMWARE/FOLDER"
  datastore_cluster: "A_DATA_STORE_CLUSTER"
  template: "A_VM_TEMPLATE"
  memory: 8
  num_cpus: 4
  defgw: "AN_IP_ADDRESS"
 
networks:
  - { label: "VLAN1_ID_NAME", ip: "1.2.3.4", nm: "24" }
  - { label: "VLAN2_ID_NAME", ip: "4.3.2.1", nm: "24" }
 
data_disks:
  - { label: "data1", unit_number: 1, size: 15, thin_provisioned: false }
  - { label: "data2", unit_number: 2, size: 15, thin_provisioned: false }
 
dns_servers:
    - "AN.IP.ADD.RESS"
    - "ANOTHER.IP.ADD.RESS"

Yes, there are some unused variable values, that will come in a later revision of this article, or another one.

This was already interesting, and useful enough, as an example.

This strategy can be also be taken as an advantage if you’re using Terraform to define a cloud DNS at the same time as local DNS providers, so you can go from one single source of truth defined in yaml but feed it to cloud.terraform and navigate through it with an Ansible jinja2 template for ISC bind, for instance.