configpolicy

Author	SHA1	Message	Date
Dustin C. Hatch	719be9a4e9	Deploy Radarr, Sonarr, Prowlarr on file0.p.b I had originally intended to deploy Radarr, Sonarr, and Prowlarr on Kubernetes. Unfortunately, this turned out to be problematic, as I would need a way to share the download directory between Radarr/Sonar and Aria2, and the media directory between Radarr/Sonarr and Jellyfin. The only way I could fathom to do this would be to expose both directories via NFS and mount that share into the pods. I decided this would be too much of a hassle for no real gain, at least not in the short term. Instead, it makes more sense to deploy the *arr suite on the same server as Aria2 and Jellyfin, which is essentially what the community expects. The recommended images for deploying the applications in containers are pretty crappy. I didn't really want to mess with trying to get the them to work natively on Fedora, nor deal with installing them from tarballs with Ansible, so I created my own Debian-based container images for them and deployed those via Podman+Quadlet. These images are published to the _Packages_ organization in Gitea, which is not public and requires authentication. We can use the Kubernetes Secret to obtain the authentication token to use to pull the image.	2025-12-03 23:05:21 -06:00
Dustin C. Hatch	23670338b3	sonarr: Deploy Sonarr in a Podman container The `sonarr.yml` playbook and corresponding role deploy Sonarr, the indexer manager for the *arr suite, in a Podman container. Note that we're relocating the log files from the Sonarr AppData directory to `/var/log/sonarr` so they can be picked up by Fluent Bit.	2025-12-03 23:00:54 -06:00
Dustin C. Hatch	9223dbe820	prowlarr: Deploy Prowlarr in a Podman container The `prowlarr.yml` playbook and corresponding role deploy Prowlarr, the indexer manager for the *arr suite, in a Podman container. Note that we're relocating the log files from the Prowlarr AppData directory to `/var/log/prowlarr` so they can be picked up by Fluent Bit.	2025-12-03 23:00:54 -06:00
Dustin C. Hatch	a41a3fa3d0	radarr: Deploy Radarr in a Podman container The `radarr.yml` playbook and corresponding role deploy Radarr, the movie library/download manager, in a Podman container. Note that we're relocating the log files from the Radarr AppData directory to `/var/log/radarr` so they can be picked up by Fluent Bit.	2025-12-03 23:00:54 -06:00
Dustin C. Hatch	fd8cc42720	hosts: Move PiKVM to separate inventory There's no reason for Jenkins to be messing with this machine. It's too different than the rest of the hosts it manages, so it's been quite difficult getting it to work anyway. Let's just move it to a separate inventory file that we have to specify manually when we want to apply a Playbook to it.	2025-12-02 08:52:22 -06:00
Dustin C. Hatch	e9d2d21ec3	hosts: Add pikvm-nvr2.m.p.b This is a Raspberry Pi 2 with HDMI-CSI adapter and Raspberry Pi Pico, connected to _nvr2.pyrocufflink.blue_, as the latter does not have a serial console.	2025-12-01 10:03:05 -06:00
Dustin C. Hatch	cce485db54	pikvm: Add role/playbook for PiKVM PiKVM comes with its own custom Arch Linux-based operating systems. We want to be able to manage it with our configuration policy, especially for setting up authentication, etc. It won't really work with the host-provisioner without some pretty significant changes to the base playbooks, but we can control some bits directly.	2025-12-01 10:01:07 -06:00
Dustin C. Hatch	0334b1b77a	Merge branch 'fluent-bit'	2025-11-24 07:49:05 -06:00
Dustin C. Hatch	04f62a1467	hosts: Remove nvr2 from AD domain The NVMe drive in _nvr2.pyrocufflink.blue_ died, so I had to re-install Fedora on a new drive. This time around, it will not be a domain member, as with the other new servers added recently.	2025-11-16 16:48:20 -06:00
Dustin C. Hatch	a500e0ece4	hosts: Decommission dc-headphone.p.b _dc-headphone.pyrocufflink.blue_ has been replaced by _dc-backless.pyrocufflink.blue_.	2025-11-01 22:28:43 -05:00
Dustin C. Hatch	7929176b4e	create-dc: Update to use new provisioning process Instead of running `virt-install` directly from the `create-dc.sh` script, it now relies on `newvm.sh`. This will ensure that VMs created to be domain controllers will conform to the same expectations as all other machines, such as using the libvirt domain metadata to build dynamic inventory. Similarly, the `create-dc.yml` playbook now imports the `host-setup.yml` playbook, which covers the basic setup of a new machine. Again, this ensures that the same policy is applied to DCs as to other machines. Finally, domain controller machines now no longer use _winbind_ for OS user accounts and authentication. This never worked particularly well on DCs anyway (particularly because of the way _winbind_ insists on using domain-prefixed user accounts when it runs on a DC), and is now worse with recent Fedora changes. Instead, DCs now have local users who authenticate via SSH certificates, the same as other current-generaton servers.	2025-10-27 12:53:27 -05:00
Dustin C. Hatch	2cba5eb2e4	fluent-bit: Make ntfy pipeline steps optional Most hosts will not need to send any messages to ntfy. Let's define the ntfy pipeline stages only for the machines that need them. There are currently two use cases for ntfy: * MD RAID status messages (from Chromie and nvr2) * WAN Link status messages (from gw1) Breaking up the pipeline into smaller pieces allows both of these use cases to define their appropriate filters while still sharing the common steps. The other machines that have no use for these steps now omit them entirely.	2025-09-15 10:46:45 -05:00
Dustin C. Hatch	57a5f83262	nextcloud: Run an SMTP relay locally For some reason, Nextcloud seems to have trouble sending mail via the network-wide relay. It opens a connection, then just sits there and never sends anything until it times out. This happens probably 4 out of 5 times it attempts to send e-mail messages. Running Postfix locally and directing Nextcloud to send mail through it and then on to the network-wide relay seems to work much more reliably.	2025-08-23 22:43:45 -05:00
Dustin C. Hatch	b72676a1bb	nextcloud: Fetch HTTPS cert from Kubernetes Since Nextcloud uses the _pyrocufflink.net_ wildcard certificate, we can load it directly from the Kubernetes Secret, rather than from the file in the _certs_ submodule, just like Gitea et al.	2025-08-11 10:39:54 -05:00
Dustin C. Hatch	8a93ef0fc1	hosts: Remove chromie.p.b from AD domain Since it was updated to Fedora 42, Jenkins configuration management jobs have been failing to apply policy to _chromie.pyrocufflink.blue_. It claims "jenkins is not in the sudoers file," apparently because `winbind` keeps "forgetting" that _jenkins_ is a member of the _server admins_ group, which is listed in `sudoers` file. I'm getting tired of messing with `winbind` and its barrage of bugs and quirks. There's no particular reason for _chromie_ to be an AD domain member, so let's just remove it and manage its users statically.	2025-08-07 15:07:02 -05:00
Dustin C. Hatch	e6ac6ae202	hosts: Decommission k8s-ctrl0 Just a few days before its third birthday 🎂 There are now three Kubernetes control plane nodes: * _ctrl-2ed8d3.k8s.pyrocufflink.black_ Raspberry Pi CM4 * _ctrl-crave.k8s.pyrocufflink.black_ (virtual machine) * _ctrl-sycamore.k8s.pyrocufflink.black_ (virtual machine)	2025-07-28 17:52:11 -05:00
Dustin C. Hatch	e1c157ce87	raspberry-pi: Add collectd sensors, thermal plugins All the Raspberry Pi machines should have the _sensors_ and _thermal_ plugins enabled so we can monitor their CPU etc. temperatures.	2025-07-28 17:50:39 -05:00
Dustin C. Hatch	53c0107651	hosts: Add CM4 k8s cluster nodes These three machines are Raspberry Pi CM4 nodes on the DeskPi Super 6c cluster board. The worker nodes have a 256 GB NVMe SSD attached.	2025-07-27 17:47:24 -05:00
Dustin C. Hatch	c67e5f4e0c	cm4-k8s-node: Add group The Raspberry Pi CM4 nodes on the DeskPi Super 6c cluster board are members of the _cm4-k8s-node_ group. This group is a child of _k8s-node_ which overrides the data volume configuration and node labels.	2025-07-27 17:45:46 -05:00
Dustin C. Hatch	0e6cc4882d	Add k8s-test group This group is used for temporary machines while testing Kubernetes node deployment changes.	2025-07-22 16:21:49 -05:00
Dustin C. Hatch	a5b47eb661	hosts: Add vm-hosts to collectd group Now that the VM hosts are not members of the AD domain, they need to be added to the _collectd_ group directly.	2025-07-18 12:47:55 -05:00
Dustin C. Hatch	906819dd1c	r/apache: Use variables for HTTPS cert/key content Using files for certificates and private keys is less than ideal. The only way to "share" a certificate between multiple hosts is with symbolic links, which means the configuration policy has to be prepared for each managed system. As we're moving toward a much more dynamic environment, this becomes problematic; the host-provisioner will never be able to copy a certificate to a new host that was just created. Further, I have never really liked the idea of storing certificates and private keys in Git anyway, even if it is in a submodule with limited access.	2025-07-13 16:02:57 -05:00
Dustin C. Hatch	a399591f16	hosts: Decommission node-refrain.k.p.b I did something stupid to this machine trying to clear up its `/var/lib/containers/storage` volume and now it won't start any new pods. Killing it and replacing.	2025-06-21 17:51:06 -05:00
Dustin C. Hatch	025f2ddd8c	hosts: Remove VM hosts from AD domain Having the VM hosts as members of the domain has been troublesome since the very beginning. In full shutdown events, it's often difficult or impossible to log in to the VM hosts while the domain controller VMs are down or still coming up, even with winbind caching. Now that we have the `users.yml` playbook, the SSH certificate authority, and `doas`+pam_ssh_agent_auth, we really don't need the AD domain for centralized authentication.	2025-06-08 09:04:27 -05:00
Dustin C. Hatch	d4d3f0ef81	r/victoria-logs: Deploy VictoriaLogs I've become rather frusted witih Grafana Loki lately. It has several bugs that affect my usage, including issues with counting and aggregation, completely broken retention and cleanup, spamming itself with bogus error log messages, and more. Now that VitoriaLogs has first-class support in Grafana and support for alerts, it seems like a good time to try it out. It's under very active development, with bugs getting fixed extremely quickly, and new features added constantly. Indeed, as I was experimenting with it, I thought, "it would be nice if the web UI could decode ANSI escapes for terminal colors," and just a few days later, that feature was added! Native support for syslog is also a huge benefit, as it will allow me to collect logs directly from network devices, without first collecting them into a file on the Unifi controller. This new role deploys VictoriaLogs in a manner very similar to how I have Loki set up, as a systemd-managed Podman container. As it has no built-in authentication or authorization, we rely on Caddy to handle that. As with Loki, mTLS is used to prevent anonymous access to querying the logs, however, authentication via Authelia is also an option for human+browser usage. I'm re-using the same certificate authority as with Loki to simplify Grafana configuration. Eventually, I would like to have a more robust PKI, probably using OpenBao, at which point I will (hopefully) have decided which log database I will be using, and can use a proper CA for it.	2025-05-30 21:19:05 -05:00
Dustin C. Hatch	6df0cc39da	unifi: Back up with Restic The Unifi Network data will now be backed up by Restic.	2025-03-29 09:36:37 -05:00
Dustin C. Hatch	78d70af574	hosts: Add Unifi controllers to needproxy group Since the network device management network does not have access to the Internet, the Unifi controller machines must access it via the proxy.	2025-03-19 07:50:52 -05:00
Dustin C. Hatch	db54b03aa8	r/unifi: Switching to custom container image The _linuxserver.io_ image for UniFi Network is deprecated. It sucked anyway. I've created a simple image based on Debian that installs the _unifi_ package from the upstream apt repository. This image doesn't require running anything as _root_, so it doesn't need a user namespace.	2025-03-16 16:40:57 -05:00
Dustin C. Hatch	c300dc1b6c	chrony: Add role/PB for chrony I continually struggle with machines' (physical and virtual, even the Roku devices!) clocks getting out of sync. I have been putting off fixing this because I wanted to set up a Windows-compatible NTP server (i.e. on the domain controllers, with Kerberos signing), but there's really no reason to wait for that to fix the clocks on all the non-Windows machines, especially since there are exactly 0 Windows machines on the network right now. The chrony role and corresponding `chrony.yml` playbook are generic, configured via the `chrony_pools`, `chrony_servers`, and `chrony_allow` variables. The values for these variables will configure the firewall to act as an NTP server, synchronizing with the NTP pool on the Internet, while all other machines will synchronize with it. This allows machines on networks without Internet access to keep their clocks in sync.	2025-03-16 16:37:19 -05:00
Dustin C. Hatch	5f4b1627db	hosts: Add nut1.p.b to pyrocufflink group nut1.pyrocufflink.blue is a member of the pyrocufflink.blue AD domain. I'm not sure how it got to be so without belonging to the _pyrocufflink_ Ansible group...	2025-02-25 21:03:14 -06:00
Dustin C. Hatch	f705e98fab	hosts: Add k8s-iot-net-ctrl group The k8s-iot-net-ctrl group is for the Raspberry Pi that has the Zigbee and Z-Wave controllers connected to it. This node runs the Zigbee2MQTT and ZWaveJS2MQTT servers as Kubernetes pods.	2025-01-31 19:49:51 -06:00
Dustin C. Hatch	b1c29fc12a	hosts: Remove hostvds group Since the _hostvds_ group is not defined in the static inventory but by the OpenStack inventory plugin via `hostvds.openstack.yml`, when the static inventory is used by itself, Ansible fails to load it with an error: > Section [vps:children] includes undefined group: hostvds To fix this, we could explicitly define an empty _hostvds_ group in the static inventory, but since we aren't currently running any HostVDS instances, we might as well just get rid of it.	2025-01-31 19:45:58 -06:00
Dustin C. Hatch	ec4fa25bd8	Merge remote-tracking branch 'refs/remotes/origin/master'	2025-01-30 21:15:40 -06:00
Dustin C. Hatch	c00d6f49de	hosts: Add OVH VPS It turns out, $0.99/mo might be _too_ cheap for a cloud server. Running the Blackbox Exporter+vmagent on the HostVDS instance worked for a few days, but then it started having frequent timeouts when probing the websites. I tried redeploying the instance, switching to a larger instance, and moving it to different networks. Unfortunately, none of this seemed to help. Switching over to a VPS running in OVH cloud. OVH VPS servers are managed statically, as opposed to via API, so we can't use Pulumi to create them. This one was created for me when I signed up for an OVH acount.	2025-01-26 13:08:59 -06:00
Dustin C. Hatch	33f315334e	users: Configure sudo on some machines `doas` is not available on Alma Linux, so we still have to use `sudo` on the VPS.	2025-01-26 13:08:59 -06:00
Dustin C. Hatch	ad0bd7d4a5	remote-blackbox: Add group The _remote-blackbox_ group defines a system that runs _blackbox-exporter_ and _vmagent_ in a remote (cloud) location. This system will monitor our public web sites. This will give a better idea of their availability from the perspective of a user on the Internet, which can be by factors that are necessarily visible from within the network.	2025-01-26 13:08:59 -06:00
Dustin C. Hatch	f5bee79bac	hosts: Decommission bw0.p.b Vaultwarden is now hosted in Kubernetes.	2025-01-10 20:09:53 -06:00
Dustin C. Hatch	d993d59bee	Deploy new Kubernetes nodes The stor- nodes are dedicated to Longhorn replicas. The other nodes handle general workloads.	2024-11-24 10:33:21 -06:00
Dustin C. Hatch	0f600b9e6e	kubernetes: Manage worker nodes So far, I have been managing Kubernetes worker nodes with Fedora CoreOS Ignition, but I have decided to move everything back to Fedora and Ansible. I like the idea of an immutable operating system, but the FCOS implementation is not really what I want. I like the automated updates, but that can be accomplished with _dnf-automatic_. I do _not_ like giving up control of when to upgrade to the next Fedora release. Mostly, I never did come up with a good way to manage application-level configuration on FCOS machines. None of my experiments (Cue+tmpl, KCL+etcd+Luci) were successful, which mostly resulted in my manually managing configuration on nodes individually. Managing OS-level configuration is also rather cumbersome, since it requires redeploying the machine entirely. Altogether, I just don't think FCOS fits with my model of managing systems. This commit introduces a new playbook, `kubernetes.yml`, and a handful of new roles to manage Kubernetes worker nodes running Fedora Linux. It also adds two new deploy scripts, `k8s-worker.sh` and `k8s-longhorn.sh`, which fully automate the process of bringing up worker nodes.	2024-11-24 10:33:21 -06:00
Dustin C. Hatch	a82700a257	chromie: Configure serial terminal server	2024-11-10 13:15:08 -06:00
Dustin C. Hatch	010f652060	hosts: Add loki1.p.b _loki1.pyrocufflink.blue_ replaces _loki0.pyrocufflink.blue_. The former runs Fedora Linux and is managed by Ansible, while the latter ran Fedora CoreOS and was managed by Ignition and _cfg_.	2024-11-05 06:54:27 -06:00
Dustin C. Hatch	4cd983d5f4	loki: Add role+playbook for Grafana Loki The current Grafana Loki server, loki0.pyrocufflink.blue, runs Fedora CoreOS and is managed by Ignition and cfg. Since I have declared cfg a failed experiment, I'm going to re-deploy Loki on a new VM running Fedora Linux and managed by Ansible. The loki role installs Podman and defines a systemd-managed container to run Grafana Loki.	2024-10-20 12:10:55 -05:00
Dustin C. Hatch	ceaef3f816	hosts: Decommission burp1.p.b Everything has finally been moved to Chromie.	2024-10-13 17:52:48 -05:00
Dustin C. Hatch	5ced24f2be	hosts: Decommission matrix0.p.b The Synapse server hasn't been working for a while, but we don't use it for anything any more anyway.	2024-10-13 12:53:49 -05:00
Dustin C. Hatch	621f82c88d	hosts: Migrate remaining hosts to Restic Gitea and Vaultwarden both have SQLite databases. We'll need to add some logic to ensure these are in a consistent state before beginning the backup. Fortunately, neither of them are very busy databases, so the likelihood of an issue is pretty low. It's definitely more important to get backups going again sooner, and we can deal with that later.	2024-09-07 20:45:24 -05:00
Dustin C. Hatch	c2c283c431	nextcloud: Back up Nextcloud with Restic Now that the database is hosted externally, we don't have to worry about backing it up specifically. Restic only backs up the data on the filesystem.	2024-09-04 17:41:42 -05:00
Dustin C. Hatch	0f4dea9007	restic: Add role+playbook for Restic backups The `restic.yml` playbook applies the _restic_ role to hosts in the _restic_ group. The _restic_ role installs `restic` and creates a systemd timer and service unit to run `restic backup` every day. Restic doesn't really have a configuration file; all its settings are controlled either by environment variables or command-line options. Some options, such as the list of files to include in or exclude from backups, take paths to files containing the values. We can make use of these to provide some configurability via Ansible variables. The `restic_env` variable is a map of environment variables and values to set for `restic`. The `restic_include` and `restic_exclude` variables are lists of paths/patterns to include and exclude, respectively. Finally, the `restic_password` variable contains the password to decrypt the repository contents. The password is written to a file and exposed to the _restic-backup.service_ unit using [systemd credentials][0]. When using S3 or a compatible service for respository storage, Restic of course needs authentication credentials. These can be set using the `restic_aws_credentials` variable. If this variable is defined, it should be a map containing the`aws_access_key_id` and `aws_secret_access_key` keys, which will be written to an AWS shared credentials file. This file is then exposed to the _restic-backup.service_ unit using [systemd credentials][0]. [0]: https://systemd.io/CREDENTIALS/	2024-09-04 09:40:29 -05:00
Dustin C. Hatch	708bcbc87e	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-09-03 17:18:18 -05:00
Dustin C. Hatch	a0378feda8	nextcloud: Move database to db0 Moving the Nextcloud database to the central PostgreSQL server will allow it to take advantage of the monitoring and backups in place there. For backups specifically, this will make it easier to switch from BURP to Restic, since now only the contents of the filesystem need backed up. The PostgreSQL server on _db0_ requires certificate authentication for all clients. The certificate for Nextcloud is stored in a Secret in Kubernetes, so we need to use the _nextcloud-db-cert_ role to install the script to fetch it. Nextcloud configuration doesn't expose the parameters for selecting the certificate and private key files, but fortunately, they can be encoded in the value provided to the `host` parameter, though it makes for a rather cumbersome value.	2024-09-02 21:03:33 -05:00
Dustin C. Hatch	d3a09a2e88	hosts: Add chromie, nvr2 to nut-monitor group Deploy `nut-monitor` on these physical machines so they will shut down safely in the event of a power outage.	2024-09-01 18:52:33 -05:00

1 2 3 4 5

222 Commits