[lttng-ci.git] / automation / ansible / README.md

# Setup on Ubuntu

```
apt install ansible ansible-mitogen
```

# Required collections

```
ansible-galaxy install -r roles/requirements.yml
```

# Privileged data

Privileged data is stored in Bitwarden. To use roles that fetch privileged data,
the following utilities must be available:

* [bw](https://bitwarden.com/help/cli/)

Once installed, login and unlock the vault:

```
bw login # or, `bw unlock`
export BW_SESSION=xxxx
bw sync -f
```

# Running playbooks

```
ansible-playbook -i hosts [-l SUBSET] site.yaml
```

## Skip slow tasks

`ansible-playbook --skip-tags slow`

# Bootstrapping hosts

## CI host

### Debian

1. Boot host with PXE
2. Select the option `Debian Bookworm amd64 (CI-host)` or equivalent
3. Post-preseed verifications:
  * Check that start-stop-daemon is available in `$PATH`. If not: `touch /sbin/start-stop-daemon; chmod +x /sbin/start-stop-daemon ; apt-get install --reinstall dpkg`
  * Verify that the ZFS pool `tank` exists on the target host. If not, create it e.g. `zpool create -f tank mirror dev1 dev2`
4. Add the host to the ansible inventory in the hosts group and in the appropriate cluster group
5. For LXD hosts, add the host to the `lxd` group
6. Follow the appropriate LXD or Incus cluster steps

### Windows

1. Configure either SSH or WinRM connection: see https://docs.ansible.com/ansible/latest/os_guide/windows_setup.html
2. For arm64 hosts:
  * Install the necessary optional features (eg. OpenSSH, Hyper-V) since Windows RSAT isn't available on Arm64 yet

## CI 'rootnode'

1. Add the new ansible node to the `node_standalone` group in the inventory
2. Add an entry to the `vms` variable in the host vars for the libvirt host
  * See the defaults and details in `roles/libvirt/vars/main.yml` and `roles/libvirt/tasks/main.yml`
  * Make sure to set the `cdrom` key to the path of ISO for the installer
3. Run the playbook, eg. `ansible-playbook -i hosts -l cloud07.internal.efficios.com site.yml`
  * The VM should be created and started
4. Once the VM is installed take a snapshot so that Jenkins may revert to the original state
  * `ansible-playbook playbooks/snapshot-rootnode.yml -e '{"revert_before": false}' -l new-rootnode`

### Ubuntu auto-installer

1. Note your IP address
2. Switch to the directory with the user-data files: `cd roles/libvirt/files`
3. Write out the instance-specific metadata, eg.

```
cat > meta-data <<EOF
instance-id: iid-XXX
hostname: XXX.internal.efficios.com
EOF
```
  * The instance-id is used to determine if re-installation is necessary.
4. Start a python web server: `python3 -m http.server 3003`
5. Connect to the VM using a remote viewer on the address given by `virsh --connect qemu+ssh://root@host/system domdisplay`
6. Edit the grub boot options for the installer and append the following as arguments for the kernel: `autoinstall 'ds=nocloud-net;s=http://IPADDRESS:3003/'` and boot the installer
  * Note that the trailing `/` and quoting are important
  * The will load the `user-data`, `meta-data`, and `vendor-data` files in the directory served by the python web server
7. After the installation is complete, the system will reboot and run cloud-init for the final portion of the initial setup. Once completed, ansible can be run against it using the ubuntu user and becoming root, eg. `ansible-playbook -i hosts -u ubuntu -b ...`

# LXD Cluster

## Start a new cluster

1. For the initial member of the cluster, set the `lxd_cluster` variable in the host variables to something similar to:

```
lxd_cluster:
  server_name: cluster-member-name
  enabled: true
  member_config:
    - entity: storage-pool
      name: default
      key: source
      value: tank/lxd
```

2. Run the `site.yml` playbook on the node
3. Verify that storage pool is configured:

```
$ lxc storage list
| name    | driver | state   |
| default | zfs    | created |
```

  * If not present, create it on necessary targets:

```
$ lxc storage create default zfs source=tank/lxd --target=cluster-member-name
# Repeat for any other members
# Then, on the member itself
$ lxc storage create default zfs
# The storage listed should not be in the 'pending' state
```

4. Create a metrics certificate pair for the cluster, or use an existing one

```
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:secp384r1 -sha384 -keyout metrics.key -nodes -out metrics.crt -days 3650 -subj "/CN=metrics.local"
lxc config trust add metrics.crt --type=metrics
```

## Adding a new host

2. In the member's host_vars file set the following keys:
1. On the existing host or cluster, generate a token for the new member: `lxc cluster add member-host-name`
  * `lxd_cluster_ip`: The IP address on which the server will listen
  * `lxd_cluster`: In a fashion similar to the following entry
```
lxd_cluster:
  enabled: true
  # Same as the name from the token created above
  server_name: 'member-host-name'
  # This shoud match `lxd_cluster_ip`
  server_address: 172.18.0.192
  cluster_token: 'xxx'
  member_config:
    - entity: storage-pool
      name: default
      key: source
      value: tank/lxd
```
  * The `cluster_token` does not need to be kept in git after the the playbook's first run
3. Assuming the member is in the host's group of the inventory, run the `site.yml` playbook.

## Managing instances

Local requirements:

 * python3, python3-dnspython, python3-jenkins, samba-tool, kinit

To automatically provision instances, perform certain operations, and update DNS entries:

1. Update `vars/ci-instances.yml`
2. Open a kerberos ticket with `kinit`
3. Run the playbook, eg. `ansible-playbook playbooks/ci-instances.yml`

# Incus cluster

## Migration from LXD

1. Run the `site.yml` playbook on the hosts to install `incus` and `incus-tools`
2. On one cluster member, start the `lxd-to-incus` script, and follow the prompts
3. On each other cluster member, start `lxd-to-incus --cluster-member`
4. When prompted on each cluster member, uninstall `lxd`.
Commit	Line	Data
	1	# Setup on Ubuntu
	2
	3	```
	4	apt install ansible ansible-mitogen
	5	```
	6
	7	# Required collections
	8
	9	```
	10	ansible-galaxy install -r roles/requirements.yml
	11	```
	12
	13	# Privileged data
	14
	15	Privileged data is stored in Bitwarden. To use roles that fetch privileged data,
	16	the following utilities must be available:
	17
	18	* [bw](https://bitwarden.com/help/cli/)
	19
	20	Once installed, login and unlock the vault:
	21
	22	```
	23	bw login # or, `bw unlock`
	24	export BW_SESSION=xxxx
	25	bw sync -f
	26	```
	27
	28	# Running playbooks
	29
	30	```
	31	ansible-playbook -i hosts [-l SUBSET] site.yaml
	32	```
	33
	34	## Skip slow tasks
	35
	36	`ansible-playbook --skip-tags slow`
	37
	38	# Bootstrapping hosts
	39
	40	## CI host
	41
	42	### Debian
	43
	44	1. Boot host with PXE
	45	2. Select the option `Debian Bookworm amd64 (CI-host)` or equivalent
	46	3. Post-preseed verifications:
	47	* Check that start-stop-daemon is available in `$PATH`. If not: `touch /sbin/start-stop-daemon; chmod +x /sbin/start-stop-daemon ; apt-get install --reinstall dpkg`
	48	* Verify that the ZFS pool `tank` exists on the target host. If not, create it e.g. `zpool create -f tank mirror dev1 dev2`
	49	4. Add the host to the ansible inventory in the hosts group and in the appropriate cluster group
	50	5. For LXD hosts, add the host to the `lxd` group
	51	6. Follow the appropriate LXD or Incus cluster steps
	52
	53	### Windows
	54
	55	1. Configure either SSH or WinRM connection: see https://docs.ansible.com/ansible/latest/os_guide/windows_setup.html
	56	2. For arm64 hosts:
	57	* Install the necessary optional features (eg. OpenSSH, Hyper-V) since Windows RSAT isn't available on Arm64 yet
	58
	59	## CI 'rootnode'
	60
	61	1. Add the new ansible node to the `node_standalone` group in the inventory
	62	2. Add an entry to the `vms` variable in the host vars for the libvirt host
	63	* See the defaults and details in `roles/libvirt/vars/main.yml` and `roles/libvirt/tasks/main.yml`
	64	* Make sure to set the `cdrom` key to the path of ISO for the installer
	65	3. Run the playbook, eg. `ansible-playbook -i hosts -l cloud07.internal.efficios.com site.yml`
	66	* The VM should be created and started
	67	4. Once the VM is installed take a snapshot so that Jenkins may revert to the original state
	68	* `ansible-playbook playbooks/snapshot-rootnode.yml -e '{"revert_before": false}' -l new-rootnode`
	69
	70	### Ubuntu auto-installer
	71
	72	1. Note your IP address
	73	2. Switch to the directory with the user-data files: `cd roles/libvirt/files`
	74	3. Write out the instance-specific metadata, eg.
	75
	76	```
	77	cat > meta-data <<EOF
	78	instance-id: iid-XXX
	79	hostname: XXX.internal.efficios.com
	80	EOF
	81	```
	82	* The instance-id is used to determine if re-installation is necessary.
	83	4. Start a python web server: `python3 -m http.server 3003`
	84	5. Connect to the VM using a remote viewer on the address given by `virsh --connect qemu+ssh://root@host/system domdisplay`
	85	6. Edit the grub boot options for the installer and append the following as arguments for the kernel: `autoinstall 'ds=nocloud-net;s=http://IPADDRESS:3003/'` and boot the installer
	86	* Note that the trailing `/` and quoting are important
	87	* The will load the `user-data`, `meta-data`, and `vendor-data` files in the directory served by the python web server
	88	7. After the installation is complete, the system will reboot and run cloud-init for the final portion of the initial setup. Once completed, ansible can be run against it using the ubuntu user and becoming root, eg. `ansible-playbook -i hosts -u ubuntu -b ...`
	89
	90	# LXD Cluster
	91
	92	## Start a new cluster
	93
	94	1. For the initial member of the cluster, set the `lxd_cluster` variable in the host variables to something similar to:
	95
	96	```
	97	lxd_cluster:
	98	server_name: cluster-member-name
	99	enabled: true
	100	member_config:
	101	- entity: storage-pool
	102	name: default
	103	key: source
	104	value: tank/lxd
	105	```
	106
	107	2. Run the `site.yml` playbook on the node
	108	3. Verify that storage pool is configured:
	109
	110	```
	111	$ lxc storage list
	112	\| name \| driver \| state \|
	113	\| default \| zfs \| created \|
	114	```
	115
	116	* If not present, create it on necessary targets:
	117
	118	```
	119	$ lxc storage create default zfs source=tank/lxd --target=cluster-member-name
	120	# Repeat for any other members
	121	# Then, on the member itself
	122	$ lxc storage create default zfs
	123	# The storage listed should not be in the 'pending' state
	124	```
	125
	126	4. Create a metrics certificate pair for the cluster, or use an existing one
	127
	128	```
	129	openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:secp384r1 -sha384 -keyout metrics.key -nodes -out metrics.crt -days 3650 -subj "/CN=metrics.local"
	130	lxc config trust add metrics.crt --type=metrics
	131	```
	132
	133	## Adding a new host
	134
	135	2. In the member's host_vars file set the following keys:
	136	1. On the existing host or cluster, generate a token for the new member: `lxc cluster add member-host-name`
	137	* `lxd_cluster_ip`: The IP address on which the server will listen
	138	* `lxd_cluster`: In a fashion similar to the following entry
	139	```
	140	lxd_cluster:
	141	enabled: true
	142	# Same as the name from the token created above
	143	server_name: 'member-host-name'
	144	# This shoud match `lxd_cluster_ip`
	145	server_address: 172.18.0.192
	146	cluster_token: 'xxx'
	147	member_config:
	148	- entity: storage-pool
	149	name: default
	150	key: source
	151	value: tank/lxd
	152	```
	153	* The `cluster_token` does not need to be kept in git after the the playbook's first run
	154	3. Assuming the member is in the host's group of the inventory, run the `site.yml` playbook.
	155
	156	## Managing instances
	157
	158	Local requirements:
	159
	160	* python3, python3-dnspython, python3-jenkins, samba-tool, kinit
	161
	162	To automatically provision instances, perform certain operations, and update DNS entries:
	163
	164	1. Update `vars/ci-instances.yml`
	165	2. Open a kerberos ticket with `kinit`
	166	3. Run the playbook, eg. `ansible-playbook playbooks/ci-instances.yml`
	167
	168	# Incus cluster
	169
	170	## Migration from LXD
	171
	172	1. Run the `site.yml` playbook on the hosts to install `incus` and `incus-tools`
	173	2. On one cluster member, start the `lxd-to-incus` script, and follow the prompts
	174	3. On each other cluster member, start `lxd-to-incus --cluster-member`
	175	4. When prompted on each cluster member, uninstall `lxd`.