ignition

Commit Graph

Author	SHA1	Message	Date
Dustin	3bbe380598	install-packages: Do not prevent login There's really no reason why install-packages.service needs to complete before users can log in. Indeed, being able to log in while it is running may be necessary in order to troubleshoot issues.	2024-01-25 20:49:53 -06:00
Dustin	61973c94cf	flash: Add option to override console spec The `flash.zsh` script now takes an optional `--console` argument, which can be used to override the `console=` kernel command line argument.	2024-01-25 20:06:24 -06:00
Dustin	57815bdcc5	flash: Add option to specify image URL The `flash.zsh` script now takes an optional `--image-url` argument, which can be used to specify a different FCOS base image. This could be to use a custom image or to simply avoid downloading the same image from the Internet repeatedly.	2024-01-25 20:06:24 -06:00
Dustin	eb0430392e	install-packages: Exit on error The machine gets into a pretty weird state if `install-packages.sh` fails but continues running.	2024-01-25 20:06:24 -06:00
Dustin	9e790d055c	common: Do not install collectd I think I have finally decided that I want collectd to run in a container on FCOS machines. It's much easier and quicker to deploy and configure that way. The only drawback is how filesystems are monitored, but I think I am okay with `ReportByDevice` now. In fact, I might even like it better, since container hosts have tons of redundant mounts that add noise to the disk usage charts.	2024-01-25 20:06:24 -06:00
Dustin	17ba7d9d03	serial1: Add config for serial console machine	2024-01-25 20:06:24 -06:00
Dustin	91af50acc2	nut0: Add host Ignition config nut0.pyrocufflink.blue is a Raspberry Pi 3 that will run `upsd` and `upsmon` to monitor and control the UPS on the server rack.	2024-01-19 21:56:52 -06:00
Dustin	05d5312382	fix-hybrid-mbr: Fix Hybrid GPT/MBR on RPi3 When Fedora CoreOS first boots, Ignition modifies the partition table, either to add partitions as requested in the config, or just to resize the root filesystem. In any case, this has the side effect of erasing the hybrid MBR partition table. If the hybrid MBR table is missing or incorrect, Raspberry Pi 2 and 3 devices will not be able to boot. We must therefore rebuild the missing table on first boot after Ignition has run.	2024-01-17 20:30:34 -06:00
Dustin	196ce46d54	cfg: Add apply-config-policy container unit The apply-config-policy service does what it says on the tin. It fetches the cfg.git repository and applies the configuration policy therein for the current host. This is a privileged container with practically allisolation disabled, to allow the configuration tools to manage the system.	2024-01-17 20:30:34 -06:00
Dustin	647cdb8346	ssh-host-certs: Run sshca-cli from a container Installing packages on the host system via `rpm-ostree` is _insanely_ slow, especially on Raspberry Pi devices. The main reason I chose to go that route for managing the SSH host certificates was to avoid having to maintain the systemd units in multiple places. I think the trade-off is worth it, though; bringing up a new Raspberry Pi is significantly faster, by 15+ minutes, if we do not have to wait for `rpm-ostree` at all.	2024-01-17 20:30:34 -06:00
Dustin	fd7778c01a	mkvm: Add script to create FCOS VM Fedora CoreOS can be provisioned on a QEMU virtual machine by providing the Ignition configuration via `fw_cfg` value. Unfortunately, the `string` method does not work with JSON values, so we have to use `file`. The configuration file has to be uploaded via SFTP, rather than `virsh vol-import`, since the latter would create the file with the wrong permissions, and QEMU does not automatically adjust the permissions of files used this way (like it does for disks).	2024-01-06 20:49:31 -06:00
Dustin	bdf31d7d1f	k8s-amd64-n3: Add new K8s VM node The three x86_64 Kubernetes nodes are starting to get full. Adding another VM will allow pods to be spread thinner.	2024-01-06 20:46:25 -06:00
Dustin	6dfde32a5e	Switch from Step CA to SSHCA SSH host certificates are now issued by SSHCA. The sshca-cli-systemd package contains the appropriate systemd units for it.	2024-01-06 19:57:48 -06:00
Dustin	78f9284f33	nginx: Fix configuration Bind-mount subdirectories of `/etc/nginx` individually so the non-configuration files (e.g. MIME type database) distributed with the container image are available. Fix permissions of `/var/cache/nginx` and put PID file there.	2024-01-06 19:50:42 -06:00
Dustin	910c7c56c9	local_exporter: Start after network online The local_exporter.service cannot start on first boot without the network, as it needs to pull the container image from.	2024-01-06 19:49:41 -06:00
Dustin	7926769528	kubelet: Use install-packages service The packages for the Kubelet are now installed by the install-packages service, so they can be processed int he same transaction as other packages (e.g. collectd).	2024-01-06 19:48:31 -06:00
Dustin	bdeb44ae36	collectd: Start after install The collectd.service unit is now starged automatically after it is installed on first boot.	2024-01-06 19:47:07 -06:00
Dustin	ac6c31c5d8	packages: Add after-install target unit Units that get installed via `rpm-ostree` on first boot cannot be enabled by ignition, because they do not exist when it runs `systemctl preset`. Thus, anything we want to start after its been installed needs to be explicitly started. To allow this in an extensible fashion, I've added an `after-install.target` unit and modified the `install-packages.sh` script to activate this unit once the installation is complete. The script also re-runs `systemctl preset`, so services will start automatically on subsequent boots.	2024-01-06 19:43:08 -06:00
Dustin	9d941a9985	packages: Fix service start on first boot The install-packages.service unit has to be enabled, and the condition checking for `/etc/ignition/packages.installed` was inverted. Sending standard output to the console as well as the journal allows watching progress.	2024-01-06 19:41:07 -06:00
Dustin	1cdd12454f	collectd: Set collectd_t domain permissive The default SELinux policy for collectd does not allow it all the necessary access for the way we use it. Notably, it cannot bind to the HTTP port to export Prometheus metrics, and it is not allowed to use netlink to read interface statistics. The latter is not a huge deal, as it can fall back to the legacy procfs interface, but the former is a nonstarter. Eventually, I should write an SELinux module with the correct permissions (and submit the changes upstream), but for now, we'll just make the `collectd_t` domain permissive.	2023-10-04 21:01:38 -05:00
Dustin	fb9684fa93	k8s-aarch6-n1: Add new Kubernetes node This node provides an ARM64 build environment.	2023-10-03 19:59:14 -05:00
Dustin	b5455e519a	Revert "collectd: Run collectd in privileged container" Unfortunately, running collectd in a container is not going to work. Although containers can be configured to share some of the host's namespaces, one notable exception is the mount namespace. Naturally, containers must have their own mount namespace, which prevents them from seeing filesystems that are actually mounted on the host. For collectd, this effectively makes the `df` plugin useless, which ultimately prevents us from monitoring disk space. This reverts commit `4048e5cc0a`.	2023-10-04 20:50:30 -05:00
Dustin	5862ff4cc2	local_exporter: Remove After=zincati dependency For some reason, the zincati.service unit has an `After=` dependency on multi-user.target. This creates a dependency loop between local_exporter.service and zincati.service if the former has an `After=` dependency on the latter an an (implicit) `Before=` dependency on multi-user.target. systemd will resolve this loop by removing one or the other units from the bootup sequence, so either Zincati or the local exporter will not start at boot. We can avoid this dependency loop by removing the `After=` dependency from local_exporter.service. This may cause requests for Zincati metrics to fail if it happens to come in after the local exporter starts but before Zincati does, but this is unlikely to actually be an issue.	2023-10-04 20:50:30 -05:00
Dustin	dd3be7a24a	collectd: Restart service automatically The collectd.service unit may fail for various reasons. Notably, if the container image is not present, it may fail to start if it is activated before the network is fully available. Using systemd's automatic restart mechanism will help ensure collectd is running whenever possible.	2023-10-04 20:50:30 -05:00
Dustin	40bde4df26	flash: Clean up/add support for RPi 3 Although the official Fedora CoreOS documentation only provides instructions for running CoreOS on a Raspberry Pi 4, it does actually work on older boards as well. `coreos-installer` creates a GPT disk label, which the older devices do not support, but this can be worked around using a hybrid MBR label. Unfortunately, after I put all the effort into refactoring this script and adding support for the older devices, I realized that it was rather pointless as those boards simply do not have enough memory to be useful Kubernetes nodes. I was hoping to move the Zigbee and ZWave controllers to a Raspberry Pi 3, but these processes take way too much memory for that.	2023-10-04 20:50:30 -05:00
Dustin	364f4fed50	common: Add config shared by all hosts The `common.yaml` Butane configuration file merges in all the other various Butane configuration files that we want to share amonst all CoreOS machines. These include the authorized SSH keys list, collectd deployment, SSH host certificate configuration, etc.	2023-10-03 20:07:29 -05:00
Dustin	859deb0664	sshkeys: Trust certificates issued by the CA Now that we have an internal SSH certificate authority, instead of explicitly listing all M×N keys for each user and client machine, we can list only the CA certificate in the SSH authorized keys file for the core user. This will allow any user who presents a valid, signed SSH certificate for the core principal to log in.	2023-10-03 20:06:37 -05:00
Dustin	88f165363d	step-ssh: Automatically issue/renew SSH host certs The `ssh-bootstrap` script, which is run by the ssh-bootstrap.service systemd unit, requests SSH host certificates for each of the existing SSH host keys. The certificates are issued by the POST /sshkeys/sign operation of dch-webhooks web service. The step-ssh-renew timer/service runs `step ssh renew`, in a container, on a weekly basis to renew the SSH host certificate. A host certificate must already exist, and its private key is used to authenticate to the CA server. Since `step ssh renew` can only operate on one certificate/key file at a time, the `step-ssh-renew@.container` defines a template unit. The template instance specifies the key type (i.e. `rsa`, `ecdsa`, or `ed25519`), which in turn defines which certificate and private key file to use. The timer unit activates a target unit, which depends on the concrete service units. Note that the target unit must have `StopWhenUnneeded=yes` so that it can be restarted again the next time the timer fires.	2023-10-03 20:06:37 -05:00
Dustin	4048e5cc0a	collectd: Run collectd in privileged container Installing packages with `rpm-ostree` is somewhat problematic. Notably, if a new package needs an update of an already-installed package (e.g. shared library), the new package cannot be installed until a new version of CoreOS is published with the updated dependency. In order for collectd to be effective, the container it runs in has to have most isolation features disabled. Most importantly, the PID, UTS, and network namespaces need to be shared with the host, so that collectd can "see" the actual values. Additionally, the default SELinux policy for containerized processes denies practically all of the instrumentation syscalls collectd needs, so it needs to run in the unconfined `spc_t` domain. Finally, the `/run` directory needs to be shared with the host, so collectd can communicate with various daemons via UNIX sockets.	2023-10-03 20:03:21 -05:00
Dustin	ebdf587de1	local_exporter: Exporter for Zincati metrics Zincati provides Prometheus metrics via a Unix socket. In order for these to be scraped by `vmagent`, they need to be exposed over HTTP. The `local_exporter` is designed to do specifically this. Unfortunately, the Zincati metrics socket is only accessible by the zincati user, so the `local_exporter` also needs to run as that user. Hopefully, the user ID will remain consistent in future versions of CoreOS.	2023-10-03 15:29:58 -05:00
Dustin	517151f2c8	sshkeys: Add Luma's SSH public key	2023-09-21 22:34:14 -05:00
Dustin	cb282f0bce	nvr1: Deploy notify-shutdown service	2023-09-21 22:34:14 -05:00
Dustin	11cd8ce8e9	notify-shutdown: Send a message on shutdown Since Fedora CoreOS machines tend to reboot at seemingly random times to apply updates, it would be nice to get a notification when they go down.	2023-09-21 22:34:14 -05:00
Dustin	8828bb3069	nvr1: Deploy nginx Deploying nginx on the NVR server to proxy for Frigate.	2023-09-21 22:34:14 -05:00
Dustin	9fd3aa0cd3	frigate: Configure nginx reverse proxy Using nginx, we can expose the Frigate web server via HTTPS. Since Frigate has no built-in authentication, we need to use Authelia via the nginx proxy auth feature.	2023-09-21 22:32:59 -05:00
Dustin	d907b47db1	fetchcert: Add script to fetch certs from K8s Since Fedora CoreOS machines are not managed by Ansible, we need another way to keep the HTTPS certificate up-to-date. To that end, I've added the `fetchcert.sh` script, along with a corresponding systemd service and timer unit, that will fetch the latest certificate from the Secret resource managed by the Kubernetes API. The script authenticates with a long-lived bearer token associated with a particular Kubernetes service account and downloads the current Secret to a local file. If the certificate in the Secret is different than the one already in place, the certificate and key files are updated and nginx is reloaded.	2023-09-21 22:30:23 -05:00
Dustin	222f40426a	nginx: Deploy nginx in a container	2023-09-21 22:29:51 -05:00
Dustin	a32e6676eb	nvr1: Install collectd Also enabling the `md` plugin, which is disabled by default, to monitor the software RAID array where Frigate recordings are stored.	2023-09-21 22:29:51 -05:00
Dustin	d22a65c1bd	collectd: Install and configure collectd The `collectd.yaml` Butane configuration fragment configures the machine to install collectd and its various plugin packages directly on the host using `rpm-ostree` (via install-packages.service).	2023-09-21 22:29:51 -05:00
Dustin	2048713452	packages: Add framework for installing packages Some machines may need to install multiple packages for separate use cases. Requiring each use case to define a systemd unit that runs `rpm-ostree install` directly would be cumbersome and also quite slow, as each one would have to run in turn. Instead, now there is a single install-packages.service which installs all of the packages listed in files in `/etc/ignition/packages.d`. On first boot, all files in that directory are read and all the packages they list will be installed in a single `rpm-ostree install` invocation.	2023-09-21 22:29:51 -05:00
Dustin	22c085b35d	frigate: Disable systemd filesystem isolation When`ProtectSystem` is enabled, systemd sets up a separate mount namespace for the service. Unfortunately, this appears to interfere with Podman and prevents it from cleaning up containers on shutdown.	2023-09-21 22:29:51 -05:00
Dustin	dffa17410f	frigate: Enable Frigate+ integration To keep the API key a secret, we're encrypting the environment file in the repository with GnuPG. The decrypted copy only lives in the work tree and is never committed. Changes have to be re-encrypted and committed.	2023-09-21 22:29:51 -05:00
Dustin	b80bee461a	frigate: Pass DRI device for hardware acceleration Enabling hardware acceleration using VA-API dramatically reduces `ffmpeg` CPU usage. For this to work, the Frigate container needs access to the DRI device node.	2023-09-19 10:46:52 -05:00
Dustin	ddd137a2e9	frigate: Manage state dir with tmpfiles.d Since frigate.service runs as root, the directories created by `StateDirectory` are owned by root. The processes inside the container, therefore, cannot access them. Thus, we have to use `systemd-tmpfiles` to create the state directories with the appropriate permissions.	2023-09-19 10:44:34 -05:00
Dustin	2a0b23c9a8	meta: Add Makefile When developing Butane/Ignition files, I frequently forget to update the parent files after making a change to an included file. This causes a lot of wasted time re-provisioning, only to discover that my change did not take effect. To alleviate this, we'll use `make` with some macro magic to scan the Butane files for their dependencies, and let it generate whatever Ignition files need updating any time a dependant file changes. I've also added a "publish" step to the Makefile, since I also frequently forget to upload the regenerated Ignition files to the server, causing the same headaches.	2023-09-16 08:15:08 -05:00
Dustin	2efce551ba	zram: Configure swap-on-zram CoreOS does not enable swap-on-zram by default.	2023-09-16 08:15:08 -05:00
Dustin	1a60688cc1	nvr1: Deploy Frigate on the nvr1.p.b	2023-09-16 08:13:03 -05:00
Dustin	533cdc2c09	frigate: Run Frigate in a container The frigate container must run as root, so we use a custom user namespace to map root in the container to an unprivilged user on the host. For some reason, Podman (on CoreOS anyway) fails to stop a container that uses a separate network namespace. It reports "invalid argument" when attempting to unmount the `netns` file, which then causes the container to get "stuck" in `Storage` state. Rebooting the host is apparently the only way to get the container to start again correctly. Fortunately, there's no particular reason to use an alternate network namespace for Frigate, so it can use the host's network and avoid this problem.	2023-09-16 08:06:07 -05:00
Dustin	1d71f874cf	gasket-driver: Install Coral EdgeTPU driver The gasket-driver container installs the `gasket` and `apex` kernel modules, which provide the driver for the Google Coral EdgeTPU AI accellerator module. The container image must be built ahead of time, of course, and contains modules built for a specific Fedora kernel version. The udev rule has two purposes: to set the permissions on the device node so that any user on the system can access it, and to "tag" the device so that systemd will generate a `.device` unit for it. The latter allows other units (e.g. Frigate) to express a `Requires=` and `After=` dependency on the device unit, so that they do not start until the driver is loaded.	2023-09-16 07:58:48 -05:00
Dustin	afadd7dcf5	Add flash.sh This simple script helps automate the process of flashing Fedora CoreOS onto a SD card for a Raspberry Pi.	2023-08-04 15:01:18 -05:00

1 2

51 Commits (master) All Branches Search

51 Commits (master)

All Branches