converge, base_system, and machine_manager
This describes the software I wrote for configuration-managing Debian machines before I had the sense to use NixOS, which includes configuration management as part of the operating system.
converge
converge is the library that implements the units used to configure machines.- Defines a protocol for configuration units to implement:
met?
checks if the system state is already validmeet
modifies system state
- Implements a core
converge
function that executes configuration units withmet?
andmeet
methods;
met?
->meet
->met?
detects any converge failures - Has useful log output that handles arbitrarily nested units
- Implements units for
- Ensuring files, directories, symlinks are present or missing, with specific user/group ownership, mode, immutability
- apt
- Generating and installing a metapackage that depends on all packages that should be installed
- Marking almost all packages as auto-installed (for apt autoremove)
- Autoremoving packages including unnecessary "recommends"
- Purging a package (for ensuring that something unwanted is not installed as a dependency)
- Ensuring the system does not have packages installed that are unavailable in any apt source
- Ensuring the system does not have packages installed that are newer than in any apt source
- Configuring apt's GPG keyring to ensure that leftover keys do not stick around
- Configuring
/etc/default/grub
- Configuring
/etc/fstab
- Configuring sysctl parameters
- Configuring sysfs variables
- Ensuring that changes to
/etc
are committed withetckeeper
- systemd
- Installing additional systemd units in
/etc
- Ensuring a service is started or stopped
- Ensuring a service is enabled or disabled
- Installing additional systemd units in
- Wrapping other units
- Converging a list of units and ensuring there are no conflicts between them
- Running a function after a unit's
meet
runs (e.g. for restarting a service) - Running a function before a unit's
meet
runs - Running a fallback unit if a unit fails to converge
- Turning a unit into an assert unit that raises an error instead of running
meet
- Ensuring a user is present or missing, with specific uid, gid, comment, password, shell, home directory, lock state
- Defining a list of regular (non-system) users on the system. Users that should be missing are disabled instead of deleted to prevent UID recycling.
base_system
base_system is an opinionated base setup for Linux machines. It runs its own units along with units from every role that is tagged for the machine (see the machine_manager section). base_system does:
- Package management
- Configures apt sources based on the Debian/Ubuntu release
- Configures apt GPG keys
- Configures apt pins (e.g. for installing backports)
- [role_custom_packages] Configures machine to get additional packages from custom package source
- Installs the metapackage that depends on all desired packages
- Autoremoves all unnecessary packages including "recommends"
- Purges all undesired packages
- Ensures there aren't any packages installed that are unavailable in or newer than in any apt source
- Boot and system settings
- Installs and configures the correct grub for either MBR or UEFI boot
- Configures sysfs
- Configures sysctl to improve security and reduce IO stalls
- Configures desired kernel modules to load on boot
- Configures udev rules
- Configures timezone, environment, locale
- Configures chrony for reliable time sync and disables systemd-timesyncd
- Configures
/etc/security/limits.conf
to let processes keep more than 1024 files open - Sets up daily TRIM for SSD drives
- Sets up a useful zsh configuration for root
- Networking
- Installs a firewall configuration using
ferm
that controls inbound and outbound access (including to localhost, to prevent unintended cross-user connections) - [role_custom_packages] Configures WireGuard for encrypted, authenticated communication between machines that need to reach each other
- Configures
/etc/hosts
with hostnames for machines that need to reach each other (both public and WireGuard IPs) - Configures unbound for DNS lookups without relying on ISP or public resolvers
- Configures sshd
- Configures SSH authorized_keys
- Enables prometheus-node-exporter for server monitoring by prometheus
- Installs a firewall configuration using
- Security
- Configures system to hide other users' processes from process list
- Blacklists kernel modules with a history of vulnerabilities
- Prevents polkit from letting non-root users shut down the machine
- Disables systemd Ctrl-Alt-Delete reboot key combination
- Disables security bugs in
sudo
machine_manager
machine_manager puts everything together: it is the end-user tool for managing many machines in parallel. machine_manager:
- Maintains an inventory of machines, using PostgreSQL as the backend
- Lists machines including tags and probed information
- Adds or removes tags from machines
- Configures machines
- Behind the scenes, this compiles base_system and all roles tagged for the machine into an escript, rsyncs over the escript and a small portable Erlang installation, then runs the escript over ssh
- Probes machines for pending upgrades, running kernel version, clock drift, CPU and memory info, boot time
- Upgrades packages on machines
- Behind the scenes, this upgrades, configures, then probes the machines
- Reboots machines
- Shuts down machines
- Executes a command and returns the output on machines
- Implements some additional commands
Workflow for adding a role to a machine
# mm ls bhsvps1
HOSTNAME PUBLIC IP WIREGUARD SSH TAGS RAM CPU CO TH PROBE TIME BOOT TIME TIME OFFSET KERNEL PENDING UPGRADES
bhsvps1 129.5.144.68 10.10.0.9 22 boot:mbr country:ca release:stretch role:custom_packages role:ext4 role:ovh_vps role:znc 1952 Mystery Haswell 1 1 2018-01-12T23:35:20Z 2018-01-05T17:42:47Z +0.000000295 4.14.0-27-amd64 #1
# mm tag bhsvps1 role:ebook_converter
# mm configure bhsvps1 # Waiting on: bhsvps1 # Waiting on: bhsvps1 bhsvps1 configured
Workflow for upgrading packages on many machines
# mm probe '.*' # Waiting on: bhsvps1 do8 ksca2 paris2 sbuild-stretch sbuild-stretch probed # Waiting on: bhsvps1 do8 ksca2 paris2 ksca2 probed # Waiting on: bhsvps1 do8 paris2 paris2 probed # Waiting on: bhsvps1 do8 do8 probed # Waiting on: bhsvps1 bhsvps1 probed
# mm ls -c hostname -c last_probe_time -c time_offset -c kernel -c pending_upgrades
HOSTNAME PROBE TIME TIME OFFSET KERNEL PENDING UPGRADES
bhsvps1 2018-01-11T17:14:46Z +0.000000312 4.14.0-27-amd64 #1
do8 2018-01-11T17:14:31Z -0.000000034 4.14.0-29-amd64 #1 nodejs=8.9.4-1nodesource1
ksca2 2018-01-11T17:14:06Z -0.000000208 4.14.0-29-amd64 #1 nodejs=8.9.4-1nodesource1
paris2 2018-01-11T17:14:18Z +0.000000122 4.14.0-27-amd64 #1 nodejs=8.9.4-1nodesource1
sbuild-stretch 2018-01-11T17:13:54Z +0.000000024 4.14.0-29-amd64 #1
# mm upgrade '.*' bhsvps1 had no pending upgrades in database; probe again if needed sbuild-stretch had no pending upgrades in database; probe again if needed # Waiting on: do8 ksca2 paris2 paris2 upgraded # Waiting on: do8 ksca2 ksca2 upgraded # Waiting on: do8 do8 upgraded
# mm ls -c hostname -c last_probe_time -c time_offset -c kernel -c pending_upgrades
HOSTNAME PROBE TIME TIME OFFSET KERNEL PENDING UPGRADES
bhsvps1 2018-01-12T23:13:45Z +0.000000299 4.14.0-27-amd64 #1
do8 2018-01-12T23:16:50Z -0.000000059 4.14.0-29-amd64 #1
ksca2 2018-01-12T23:16:37Z -0.000000196 4.14.0-29-amd64 #1
paris2 2018-01-12T23:16:01Z +0.000000093 4.14.0-27-amd64 #1
sbuild-stretch 2018-01-12T23:12:51Z +0.000000054 4.14.0-29-amd64 #1
Sample roles
role_custom_packages | additional apt source for installing custom packages |
role_custom_packages_server | hosts custom packages using self-contained nginx and spiped |
role_sbuild | sbuild host for building Debian packages |
role_desktop | working xfce4 desktop environment |
role_autologin | automatically log into xfce4 on boot |
role_lxc_host | LXC host machine |
role_nvidia | working NVIDIA setup on either Debian or Ubuntu |
role_apc_ups | APC UPS battery setup |