configpolicy

dustin

Author	SHA1	Message	Date
Dustin	142682ce2f	r/ssh-host-certs: Fix restart handler The _ssh-host-certs.target_ unit does not exist any more. It was provided by the _sshca-cli-systemd_ package to allow machines to automatically request their SSH host certificates on first boot. It had a `ConditionFirstBoot=` requirement, which made it not work at any other time, so there was no reason to move it into the Ansible configuration policy. Instead, we can use the _ssh-host-certs-renew.target_ unit to trigger requesting or renewing host certificates.	2025-09-17 06:40:20 -05:00
Dustin	8a7faac35b	r/ssh-host-certs: Reload sshd after renewing certs In Fedora 41, it seems the SSH daemon no longer automatically uses the new certificate after its host certificates have been renewed. To get it to pick up the new ones, we have to explicitly tell it to reload. To handle that automatically, I've added a new systemd path unit that monitors the certificate files. When it detects that one of them has changed, it will send the signal to the SSH daemon to tell it to reload.	2025-09-14 15:08:41 -05:00
Dustin	37e6622351	r/ssh-host-certs: Import systemd unit files The _sshca-cli_ package no longer provides a _-systemd_ sub-package containing the systemd unit files for automatically requesting and renewing SSH host certificates. Its original intent was to support automatically signing certificates on first boot by having the unit files installed by Anaconda, but this never really worked for various reasons. Since I'd rather not have to rebuild the RPMs every time I need to make a change to the systemd units, and Ansible is required to actually get the certificates issued anyway, it makes more sense to have the unit files in the configuration policy instead.	2025-09-14 15:08:41 -05:00
Dustin	8e8c109bf6	websites/pyrocufflink: Switch to mod_md for cert The _pyrocufflink.net_ site now obtains its certificate from Let's Encrypt using the Apache _mod_md_ (managed domain) module. This dramatically simplifies the deployment of this certificate, eliminating the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to _certs.git_, and Jenkins to push it out to the web server.	2025-09-04 10:04:37 -05:00
Dustin	29cdafac2a	scripts/shutdown-vmhost: Skip Longhorn nodes We DO NOT want to shut down the Longhorn Kubernetes nodes! Doing so would pretty much nuke everything running in the cluster. The shutdown script will need to migrate them online; fortunately, since they don't run anything except Longhorn, they should be able to migrate fine.	2025-08-29 21:38:12 -05:00
Dustin	c11a792eb8	websites/hlc: Drop formsubmit config tasks _formsubmit_ runs in Kubernetes since some time now.	2025-08-25 09:00:20 -05:00
Dustin	524ac0931a	websites/hlc: Switch to mod_md for cert management To avoid having separate certificates for the canonical _www.hatchlearningcenter.org_ site and all the redirects, we'll combine these virtual hosts into one. We can use a `RewriteCond` to avoid the redirect for the canonical name itself.	2025-08-25 09:00:20 -05:00
Dustin	fb93598586	dch-proxy: Use PROXY protocol v1 for Nextcloud Apache doesn't fully support the PROXY v2 protocol. When it's enabled, it spams its error log with messages about unsupported features, e.g.: > [remoteip:error] [pid 1257:tid 1302] [client 172.30.0.6:45614] > AH03507: RemoteIPProxyProtocol: unsupported command 20	2025-08-23 22:52:08 -05:00
Dustin	57a5f83262	nextcloud: Run an SMTP relay locally For some reason, Nextcloud seems to have trouble sending mail via the network-wide relay. It opens a connection, then just sits there and never sends anything until it times out. This happens probably 4 out of 5 times it attempts to send e-mail messages. Running Postfix locally and directing Nextcloud to send mail through it and then on to the network-wide relay seems to work much more reliably.	2025-08-23 22:43:45 -05:00
Dustin	1a3f68e18b	Merge remote-tracking branch 'refs/remotes/origin/master'	2025-08-23 22:43:00 -05:00
Dustin	1c1bff3ec0	r/nextcloud: Fix a bunch of deployment warnings The Nextcloud administration overview page listed a bunch of deployment configuration warnings that needed to be addressed: * Set the default phone region * Define a maintenance window starting at 0600 UTC * Increase the PHP memory limit to 1GiB * Increase the PHP OPCache interned strings buffer size * Increase the allowed PHP OPcache memory limit * Fix Apache rewrite rules for /.well-known paths	2025-08-23 22:39:44 -05:00
Dustin	6cd576dd2b	dch-proxy: Proxy for Authelia Authelia is now exposed to the public Internet, under the name _auth.pyrocufflink.net_, which allows it to protect public websites as well.	2025-08-23 22:29:28 -05:00
Dustin	70909d1b13	websites: Enable PROXY protocol for HTTPS sites Since the reverse proxy does TLS pass-through instead of termination, the original source address is lost. Since the source address is important for logging, rate limiting, and access control, we need to use the HAProxy PROXY protocol to pass it along to the web server. Since the PROXY protocol works at the TCP layer, _all_ connections must use it. Fortunately, all of the sites hosted by the public web server are in fact public and only accessed through HAProxy. Similarly, enabling it for one named virtual host enables it for all virtual hosts on that port. Thus, we only have to explicitly set it for one site, and all the rest will use it as well.	2025-08-23 22:21:54 -05:00
Dustin	717a8f90c6	websites: Remove formsubmit Nothing is using _formsubmit_ right now, but it's been moved to Kubernetes anyway.	2025-08-23 20:44:41 -05:00
Dustin	7fc3465d56	smtp1: Fix mynetworks setting for k8s network The "Kubernetes" subnet is a /27, not a /28. There are hosts in that upper section that was masked out, and these were unable to send e-mails via the relay because they were excluded from the `mynetworks` value.	2025-08-20 07:11:27 -05:00
Dustin	5dbe26fc60	r/repohost: Optimize createrepo queue loop Instead of waking every 30 seconds, the queue loop in `repohost-createrepo.sh` now only wakes when it receives an inotify event indicating the queue file has been modified. To avoid missing events that occured while a `createrepo` process was running, there's now an inner loop that runs until the queue is completely empty, before returning to blocking on `inotifywait`.	2025-08-20 07:11:27 -05:00
Dustin	2d51e2001d	gw1: Allow internal IPv6 clients Specifically to allow the Synology to synchronize its clock, as it only has an IPv6 address. We also need to explicitly override `chrony_servers` to an empty list for the firewall itself, since it syncs with the NTP pool, rather than its next hop router.	2025-08-17 20:52:36 -05:00
Dustin	f8d58ef0ed	websites/dcow: Transition to static site We don't really use this site for screenshot sharing any more. It's cool to keep to look at old screenshots, so I've saved a static snapshot of it that can be hosted by plain ol' Apache.	2025-08-16 08:55:28 -05:00
Dustin	b72676a1bb	nextcloud: Fetch HTTPS cert from Kubernetes Since Nextcloud uses the _pyrocufflink.net_ wildcard certificate, we can load it directly from the Kubernetes Secret, rather than from the file in the _certs_ submodule, just like Gitea et al.	2025-08-11 10:39:54 -05:00
Dustin	f5ab739c9e	websites: dustinandtabitha: Switch to mod_md for cert The _dustinandtabitha.com_ site now obtains its certificate from Let's Encrypt using the Apache _mod_md_ (managed domain) module. This dramatically simplifies the deployment of this certificate, eliminating the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to _certs.git_, and Jenkins to push it out to the web server.	2025-08-11 10:34:30 -05:00
Dustin	33da25209d	r/lego: Fix timer unit trigger `OnActiveSec` only fires once. To trigger the renew periodically, we need to use `OnCalendar`.	2025-08-10 17:45:46 -05:00
Dustin	713fd794a3	remote-blackbox: Scrape HTTPS for some sites Now that the Blackbox exporter does not follow redirects, we need to explicitly tell it to scrape the HTTPS variant of sites that have it enabled. Otherwise, we only get info about the first HTTP-to-HTTPS redirect response, which is not helpful for watching certificate expiry.	2025-08-08 11:09:28 -05:00
Dustin	8a93ef0fc1	hosts: Remove chromie.p.b from AD domain Since it was updated to Fedora 42, Jenkins configuration management jobs have been failing to apply policy to _chromie.pyrocufflink.blue_. It claims "jenkins is not in the sudoers file," apparently because `winbind` keeps "forgetting" that _jenkins_ is a member of the _server admins_ group, which is listed in `sudoers` file. I'm getting tired of messing with `winbind` and its barrage of bugs and quirks. There's no particular reason for _chromie_ to be an AD domain member, so let's just remove it and manage its users statically.	2025-08-07 15:07:02 -05:00
Dustin	423f28ea53	remote-blackbox: Do not follow HTTP redirects There are a couple of websites we scrape that simply redirect to another name (e.g. _pyrocufflink.net_ → _dustin.hatch.name_, _tabitha.biz_ → _hatchlearningcenter.org_). For these, we want to track the availability of the first step, not the last, especially with regard to their certificate lifetimes.	2025-08-07 11:55:31 -05:00
Dustin	0e15c6a635	needproxy: Add logs.p.b to NO_PROXY `fluent-bit` has a bug ([#3619], [#3907], [#6759]) in its handling of the `NO_PROXY` environment variable. Instead of matching a domain and all its subdomain, like it claims to do in its [documentation][0], it only does an exact string match on the full host name. To work around this, we need to explicitly list `logs.pyrocufflink.blue` in the `no_proxy` value; this will not have any impact on other consumers of this variable, but will make `fluent-bit` work as expected, connecting directly to Victoria Logs instead of through the proxy. [0]: https://docs.fluentbit.io/manual/administration/http-proxy#no_proxy [#3619]: https://github.com/fluent/fluent-bit/issues/3619 [#3907]: https://github.com/fluent/fluent-bit/issues/3907 [#6759]: https://github.com/fluent/fluent-bit/issues/6759	2025-08-06 10:46:03 -05:00
Dustin	daa602495c	r/frigate: Add udev rules for coral tpu Since the _frigate.service_ unit depends on _dev-apex_0.device_, `/dev/apex_0` needs to have the `systemd` "tag" on its udev device info. Without this tag, systemd will not "see" the device and thus will not mark the `.device` unit as active.	2025-08-06 09:04:04 -05:00
Dustin	9b4232d01a	Merge remote-tracking branch 'refs/remotes/origin/master'	2025-08-05 18:17:13 -05:00
Dustin	6bc0475e89	raid-array: Fix md re-add automation Recent versions of `mdadm` stopped accepting `/dev/disk/by-id` symlinks as the MD device: > mdadm: Value "/dev/disk/by-id/md-name-backup5" cannot be set as devname. Reason: Cannot be started from '/' or '<'. To work around this, we need a script to resolve the symlink and pass the real block device name.	2025-08-05 10:31:33 -05:00
Dustin	dcef009353	fluent-bit: send md alerts to ntfy For machines that have Linux MD RAID arrays, I want to receive notifications about the status of the arrays immediately via _ntfy_. I had this before with `journal2ntfy`, but I never got around to setting it up for the current generation of machines (_nvr2_, _chromie_). Now that we have `fluent-bit` deployed, we can use its pipeline capabilities to select the subset of messages for which we want immediate alerts and send them directly to _ntfy_. We use a Lua function to transform the log record into a body compatible with _ntfy_'s JSON publish request; `fluent-bit` doesn't have any other way to set array values, as needed for the `tags` member.	2025-08-05 10:28:20 -05:00
Dustin	0fe296f7f3	fluent-bit: Deploy log collector for Victoria Logs [fluent-bit][0] is a generic, highly-configurable log collector. It was apparently initially developed for fluentd, but is has so many output capabilities that it works wil many different log aggregation systems, including Victoria Logs. Although Victoria Logs supports the Loki input format, and therefore _Promtail_ would work, I want to try to avoid depending on third-party repositories. _fluent-bit_ is packaged by Fedora, so there shouldn't be any dependency issues, etc. [0]: https://fluentbit.io	2025-08-05 07:14:08 -05:00
Dustin	c35c7b8520	r/apache: log errors to syslog by default Logging to syslog will allow messages to be aggregated in the central server (Loki now, Victoria Logs eventually), so I don't have to SSH into the web server to check for errors.	2025-08-04 09:49:19 -05:00
Dustin	84a8a0d4af	websites: dustin.hatch.n: Switch to mod_md for cert The _dustin.hatch.name_ site now obtains its certificate from Let's Encrypt using the Apache _mod_md_ (managed domain) module. This dramatically simplifies the deployment of this certificate, eliminating the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to _certs.git_, and Jenkins to push it out to the web server.	2025-08-04 09:49:19 -05:00
Dustin	71b1363c58	r/vmhost: Install nmap-ncat While clients can use `virt-ssh-helper` to communicate with `libvirtd`, they need `nc` in order to forward SPICE graphics communication.	2025-07-31 10:19:11 -05:00
Dustin	9e7b9420f4	k8s-iot-net-ctrl: Add node role taints Previously, _node-474c83.k8s.pyrocufflink.black_ was tainted `du5t1n.me/machine=raspberrypi`, which prevented arbitrary pods from being scheduled on it. Now that there are two more Raspberry Pi nodes in the cluster, and arbitrary pods _should_ be scheduled on them, this taint no longer makes sense. Instead, having specific taints for the node's roles is more clear.	2025-07-29 21:44:29 -05:00
Dustin	7f8e39ebd4	websites: chmod777.sh: Switch to mod_md for cert The _chmod777.sh_ site now obtains its certificate from Let's Encrypt using the Apache _mod_md_ (managed domain) module. This dramatically simplifies the deployment of this certificate, eliminating the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to _certs.git_, and Jenkins to push it out to the web server.	2025-07-28 18:53:58 -05:00
Dustin	2b12ce769c	remote-blackbox: Scrape Invoice Ninja	2025-07-28 18:28:30 -05:00
Dustin	3270011fee	r/vmhost: Work around libvirt SELinux policy bug With the transition to modular _libvirt_ daemons, the SELinux policy is a bit more granular. Unfortunately, the new policy has a funny [bug]: it assumes directories named `storage` under `/run/libvirt` must be for _virtstoraged_ and labels them as such, which prevents _virtnetworkd_ from managing a virtual network named `storage`. To work around this, we need to give `/run/libvirt/network` a special label so that its children do not match the file transition pattern for _virtstoraged_ and thus keep their `virtnetworkd_var_run_t` label. [bug]: https://bugzilla.redhat.com/show_bug.cgi?id=2362040	2025-07-28 18:23:24 -05:00
Dustin	2ee86f6344	r/vmhost: Retry vm-autostart if libvirt is down If the _libvirt_ daemon has not fully started by the time `vm-autostart` runs, we want it to fail and try again shortly. To allow this, we first attempt to connect to the _libvirt_ socket, and if that fails, stop immediately and try again in a second. This way, the first few VMs don't get skipped with the assumption that they're missing, just because the daemon wasn't ready yet.	2025-07-28 18:20:50 -05:00
Dustin	4df047cf76	r/vmhost: Disable DynamicUsers for vm-autostart _libvirt_ has gone full Polkit, which doesn't work with systemd dynamic users. So, we have to run `vm-autostart` as root (with no special OS-level privileges) in order for Polkit to authorize the connection to the daemon socket.	2025-07-28 18:18:35 -05:00
Dustin	a63ee2bff5	newvm: Use fedora-rawhide OS variant Apparently, it's not guaranteed that _libosinfo_ always supports even the version of Fedora it's installed on: there's no _fedora42_ in _libosinfo-1.12.0-2.fc42_ 🤦🏻‍♂️. Fortunately, it makes almost no difference what OS variant is selected at install time, and we probably want the latest features anyway. Thus, we can just use _fedora-rawhide_ instead of any particular version and not worry about it.	2025-07-28 18:15:45 -05:00
Dustin	4804b1357b	newvm: Adjust min memory for Fedora 41+ The Anaconda runtime is _way_ bigger in Fedora 41, presumably because of the new web UI. Even though we use text-only automated installs, we still need enough space for the whole thing to fit in RAM.	2025-07-28 18:14:02 -05:00
Dustin	0ef65e4e5d	vm-hosts: Update vm_autostart list I never remember to update this list when I add/remove VMs. * _bw0_ has been decommissioned; Vaultwarden now runs in Kubernetes * _unifi3_ has been replaced by _unifi-nuptials_ * _logs-dusk_ runs Victoria Logs, which will evenutally replace Loki * _node-refrain_ has been replaced by _node-direction_ * _k8s-ctrl0_ has been replaced by _ctrl-crave_ and _ctrl-sycamore_	2025-07-28 18:12:09 -05:00
Dustin	e6ac6ae202	hosts: Decommission k8s-ctrl0 Just a few days before its third birthday 🎂 There are now three Kubernetes control plane nodes: * _ctrl-2ed8d3.k8s.pyrocufflink.black_ Raspberry Pi CM4 * _ctrl-crave.k8s.pyrocufflink.black_ (virtual machine) * _ctrl-sycamore.k8s.pyrocufflink.black_ (virtual machine)	2025-07-28 17:52:11 -05:00
Dustin	e1c157ce87	raspberry-pi: Add collectd sensors, thermal plugins All the Raspberry Pi machines should have the _sensors_ and _thermal_ plugins enabled so we can monitor their CPU etc. temperatures.	2025-07-28 17:50:39 -05:00
Dustin	bf33c2ab7c	datavol: Handle undefined logical_volumes This fixes an `Unable to look up a name or access an atribute in template string` error when applying the `datavol.yml` playbook for a machine that does not define any LVM logical volumes.	2025-07-28 16:51:04 -05:00
Dustin	59d17bf3f4	r/v-l: Use the host network I don't know what the deal is, but restarting the _victoria-logs_ container makes it lose inbound network connectivity. It appears that the firewall rules that forward the ports to the container's namespace seem to get lost, but I can't figure out why. To fix it, I have to flush the netfilter rules (`nft flush ruleset`) and then restart _firewalld_ and _victoria-logs_ to recreate them. This is rather cumbersome, and since Victoria Logs runs on a dedicated VM, there's really not much advantage to isolating the container's network.	2025-07-27 17:47:31 -05:00
Dustin	b2d35ac881	victoria-logs: Listen for Linux netconsole logs The Linux [netconsole][0] protocol is a very simple plain-text UDP stream, with no real metadata to speak of. Although it's not really syslog, Victoria Logs is able to ingest the raw data into the `_msg` field, and uses the time of arrival as the `_time` field. _netconsole_ is somewhat useful for debugging machines that do not have any other console (no monitor, no serial port), like the Raspberry Pi CM4 modules in the DeskPi Super 6c cluster. Unfortunately, its implementation in the kernel is so simple, even the source address isn't particularly useful as an identifier, and since Victoria Logs doesn't track that anyway, we might as well just dump all the messages into a single stream. It's not really discussed in the Victora Logs documentation, but any time multiple syslog listeners with different properties, _all_ of the listeners _must_ specify _all_ of those properties. The defaults will _not_ be used for any stream; the value provided for one stream will be used for all the others unless they specify one themselves. Thus, in order to use the default stream fields for the "regular" syslog listener, we have to explicitly set them. [0]: https://www.kernel.org/doc/html/latest/networking/netconsole.html	2025-07-27 17:47:31 -05:00
Dustin	fad63d5973	inventory: Ignore errors connecting to libvirt If one of the VM hosts is offline, we still want to be able to generate the inventory from the other host.	2025-07-27 17:47:31 -05:00
Dustin	53c0107651	hosts: Add CM4 k8s cluster nodes These three machines are Raspberry Pi CM4 nodes on the DeskPi Super 6c cluster board. The worker nodes have a 256 GB NVMe SSD attached.	2025-07-27 17:47:24 -05:00
Dustin	c67e5f4e0c	cm4-k8s-node: Add group The Raspberry Pi CM4 nodes on the DeskPi Super 6c cluster board are members of the _cm4-k8s-node_ group. This group is a child of _k8s-node_ which overrides the data volume configuration and node labels.	2025-07-27 17:45:46 -05:00

1 2 3 4 5 ...

1167 Commits (master) All Branches Search

1167 Commits (master)

All Branches