616 Commits

Author SHA1 Message Date
5dba0aec8f fluent-bit: create configmap for kubernetes nodes
The last step in replacing Loki with Victoria Logs is to ingest logs
from Kubernetes pods.  Like Promtail, Fluent Bit is capable of
augmenting log records with Kubernetes metadata, so we can search for
logs by pod name, namespace, etc.  This of course requires access to the
Kubernetes API, and the easiest way to provide that is to run Fluent Bit
as a Kubernetes pod, granting its service account the appropriate
permissions.

Since Fluent Bit also collects logs from the systemd journal, I want to
make sure the configuration for that function stays the same on
Kubernetes nodes as on all other servers.  One way to do that would be
to run two different instances of Fluent Bit: one managed by Ansible
that collects journal messages, and another managed by Kubernetes that
collects pod logs.  This seems like unnecessary overhead, so I have
chosen a hybrid approach.  Ansible manages the configuration for the
process running in Kubernetes.
2025-12-04 21:26:03 -06:00
f892570467 r/f-b-arr: Configure Fluent Bit for Servarr logs
The _fluent-bit-servarr_ role creates a configuration file for Fluent
Bit to read and parse logs from Radarr, Sonarr, and Prowlarr.  These
logs can then be sent to an output by defining the
`fluent_bit_servarr_outputs` variable.
2025-12-03 23:00:54 -06:00
23670338b3 sonarr: Deploy Sonarr in a Podman container
The `sonarr.yml` playbook and corresponding role deploy Sonarr, the
indexer manager for the *arr suite, in a Podman container.

Note that we're relocating the log files from the Sonarr AppData
directory to `/var/log/sonarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
9223dbe820 prowlarr: Deploy Prowlarr in a Podman container
The `prowlarr.yml` playbook and corresponding role deploy Prowlarr, the
indexer manager for the *arr suite, in a Podman container.

Note that we're relocating the log files from the Prowlarr AppData
directory to `/var/log/prowlarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
a41a3fa3d0 radarr: Deploy Radarr in a Podman container
The `radarr.yml` playbook and corresponding role deploy Radarr, the
movie library/download manager, in a Podman container.

Note that we're relocating the log files from the Radarr AppData
directory to `/var/log/radarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
6ad76e4b33 r/fluent-bit: Support drop-in configuration files
Fluent Bit supports including configuration fragments from other files
using its `includes` option.  Adding a glob pattern to the default
configuration will allow other roles to supply additional configuration
by creating files in the `/etc/fluent-bit/include` directory.  This
makes composition of configuration significantly easier.

Unfortunately, `fluent-bit` has a quirk in that there must exist at
least one file matching the glob pattern, or it will fail to start.  To
work around this, we must supply an empty fragment.
2025-12-03 23:00:54 -06:00
cc288a4ee3 r/apache-base: Factor out handlers for reuse
Roles that need to reload or restart Apache after writing configuration
files do not necessarily need to depend on the _apache_ role, but may
assume Apache is deployed in some other way.  To support this, I have
factored out the handlers from the _apache_ role into an _apache-base_
role, which such roles can list as a dependency.
2025-12-03 23:00:54 -06:00
cce485db54 pikvm: Add role/playbook for PiKVM
PiKVM comes with its own custom Arch Linux-based operating systems.  We
want to be able to manage it with our configuration policy, especially
for setting up authentication, etc.  It won't really work with the
host-provisioner without some pretty significant changes to the base
playbooks, but we can control some bits directly.
2025-12-01 10:01:07 -06:00
4fc0e7bdec r/base: Conditionally install Python SELinux libs
We do not need to install the SELinux bindings for operating systems
that do not support SELinux.
2025-12-01 09:58:56 -06:00
0334b1b77a Merge branch 'fluent-bit' 2025-11-24 07:49:05 -06:00
8aa1e986d4 r/gitea: Enable PROXY protocol
Using the PROXY protocol allows the publicly-facing reverse proxy to
pass through the original source address of the client, without doing
TLS termination.  Clients on the internal network will not go through
the proxy, though, so we have to disable the PROXY protocol for those
addresses.  Unfortunately, the syntax for this is kind of cumbersome,
because Apache only has a deny list, not an allow list, so we have to
enumerate all of the possible internal addresses _except_ the proxy.
2025-11-19 07:43:29 -06:00
25d813144c r/web/hlc: Drop cert role
The certificate for _hatchlearningcenter.org_ is managed by Apache
*mod_md* now.
2025-11-17 08:00:45 -06:00
60b7a20e1f frigate: Switch to pre-compiled gasket-driver RPM
The DKMS package for the _gasket-driver_ kernel modules is something of
a problem.  For one thing, upstream seems to have abandoned the driver
itself, and it now requires several patches in order to compile for
current kernel versions.  These patches are not included in the DKMS
package, and thus have to be applied manually after installing it.  More
generally, I don't really like how DKMS works anyway.  Besides requiring
a full kernel development toolchain on a production system, it's
impossible to know if a module will compile successfully until _after_
the new kernel has been installed and booted.  This has frequently meant
that Frigate won't come up after an update because building the module
failed.  I would much rather have a notification about a compatibility
issue for an _upcoming_ update, rather than an applied one.

To rectify these issues, I have created a new RPM package tha contains
pre-built, signed kernel modules for the Coral EdgeTPU device.  Unlike
the DKMS package, this package needs to be rebuilt for every kernel
version, however, this is done by Jenkins before the updated kernel gets
installed on the machine.  It also expresses a dependency on an exact
kernel version, so the kernel cannot be updated until a corresponding
_gasket-driver_ package is available.
2025-11-16 16:30:51 -06:00
94a777fec8 r/collectd-sensors: Add missing handlers file 2025-11-16 16:30:51 -06:00
daa91e71a1 Merge remote-tracking branch 'refs/remotes/origin/master' 2025-11-16 16:24:04 -06:00
fce060bdec r/ssh-host-certs: Fix circular dep in reload.path
The `reload-ssh-cert.path` unit introduced a circular ordering
dependency with `sshd.service` by way of `paths.target`.  There's no
particular reason for this dependency here, so we need to remove it to
resolve the issue.
2025-11-13 18:40:52 -06:00
44c3dba46a r/gitea: Update to v1.24.7 2025-11-12 17:48:09 -06:00
4b91e088ea r/apache: Reduce amount of logs stored
There's really no reason to keep 4 256 MiB log files, especially access
logs.  In any case, most of the web servers only have 1 GiB log volume,
so this configuration tends to fill them up.
2025-11-09 13:23:02 -06:00
5af25bcccf r/dch-yum: Trust GPG key
We need to explicitly add the GPG signing key for the _dch_ repository
to the system trust store, otherwise, _dnf-automatic_ will fail, as it
cannot implicitly add new keys during an update.
2025-10-27 12:54:07 -05:00
dc8961de92 fluent-bit: Do not apply to K8s nodes
We'll manage Fluent-Bit on Kubernetes nodes as a DaemonSet.  This will
be necessary in order to grant it access to the Kubernetes API so it can
augment log records with Kubernetes metadata (labels, pod name, etc.).
2025-10-17 07:51:32 -05:00
96ac5be3b5 r/kubelet: Schedule automatic image prune
As pods move around between nodes, applications are updated, etc., nodes
tend to accumulate images in their container stores that are no longer
used.  These take up space unnecessarily, eventually triggering disk
usage alarms.  From now, the _kubelet_ role installs a systemd timer and
service unit to periodically clean up these unused images.
2025-10-13 09:54:20 -05:00
142682ce2f r/ssh-host-certs: Fix restart handler
The _ssh-host-certs.target_ unit does not exist any more.  It was
provided by the _sshca-cli-systemd_ package to allow machines to
automatically request their SSH host certificates on first boot.  It had
a `ConditionFirstBoot=` requirement, which made it not work at any other
time, so there was no reason to move it into the Ansible configuration
policy.  Instead, we can use the _ssh-host-certs-renew.target_ unit to
trigger requesting or renewing host certificates.
2025-09-17 06:40:20 -05:00
4601b4d092 victoria-logs: Update to v1.33.1 2025-09-15 11:13:01 -05:00
c2d26f1f59 r/fluent-bit: Drop network.target requirement
The _network.target_ unit should be used for ordering only.  Listing it
as a `Requires=` dependency can cause _fluent-bit.service_ to fail to
start at all if the network takes slightly too long to initialize at
boot.
2025-09-15 10:49:32 -05:00
0331a55b3e r/fluent-bit: Set HOSTNAME environment variable
Fluent-bit does not have any native capability for setting a field with
the hostname of the machine, but it can set a field with the value of an
environment variable.  Thus, we can set the `HOSTNAME` environment
variable and then use that to set the field in the pipeline.
2025-09-15 07:53:13 -05:00
d0bffdeb15 r/fluent-bit: Support configuring parsers
When ingesting logs from sources other than systemd, such as
unstructured log files written by uncooperative services, it may be
necessary to define custom parsers.
2025-09-15 07:51:39 -05:00
8a7faac35b r/ssh-host-certs: Reload sshd after renewing certs
In Fedora 41, it seems the SSH daemon no longer automatically uses the
new certificate after its host certificates have been renewed.  To get
it to pick up the new ones, we have to explicitly tell it to reload.  To
handle that automatically, I've added a new systemd path unit that
monitors the certificate files.  When it detects that one of them has
changed, it will send the signal to the SSH daemon to tell it to reload.
2025-09-14 15:08:41 -05:00
37e6622351 r/ssh-host-certs: Import systemd unit files
The _sshca-cli_ package no longer provides a _-systemd_ sub-package
containing the systemd unit files for automatically requesting and
renewing SSH host certificates.  Its original intent was to support
automatically signing certificates on first boot by having the unit
files installed by Anaconda, but this never really worked for various
reasons.  Since I'd rather not have to rebuild the RPMs every time I
need to make a change to the systemd units, and Ansible is required to
actually get the certificates issued anyway, it makes more sense to have
the unit files in the configuration policy instead.
2025-09-14 15:08:41 -05:00
8e8c109bf6 websites/pyrocufflink: Switch to mod_md for cert
The _pyrocufflink.net_ site now obtains its certificate from Let's
Encrypt using the Apache _mod_md_ (managed domain) module.  This
dramatically simplifies the deployment of this certificate, eliminating
the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to
_certs.git_, and Jenkins to push it out to the web server.
2025-09-04 10:04:37 -05:00
c11a792eb8 websites/hlc: Drop formsubmit config tasks
_formsubmit_ runs in Kubernetes since some time now.
2025-08-25 09:00:20 -05:00
524ac0931a websites/hlc: Switch to mod_md for cert management
To avoid having separate certificates for the canonical
_www.hatchlearningcenter.org_ site and all the redirects, we'll combine
these virtual hosts into one.  We can use a `RewriteCond` to avoid the
redirect for the canonical name itself.
2025-08-25 09:00:20 -05:00
1a3f68e18b Merge remote-tracking branch 'refs/remotes/origin/master' 2025-08-23 22:43:00 -05:00
1c1bff3ec0 r/nextcloud: Fix a bunch of deployment warnings
The Nextcloud administration overview page listed a bunch of deployment
configuration warnings that needed to be addressed:

* Set the default phone region
* Define a maintenance window starting at 0600 UTC
* Increase the PHP memory limit to 1GiB
* Increase the PHP OPCache interned strings buffer size
* Increase the allowed PHP OPcache memory limit
* Fix Apache rewrite rules for /.well-known paths
2025-08-23 22:39:44 -05:00
70909d1b13 websites: Enable PROXY protocol for HTTPS sites
Since the reverse proxy does TLS pass-through instead of termination,
the original source address is lost.  Since the source address is
important for logging, rate limiting, and access control, we need to use
the HAProxy PROXY protocol to pass it along to the web server.

Since the PROXY protocol works at the TCP layer, _all_ connections must
use it. Fortunately, all of the sites hosted by the public web server
are in fact public and only accessed through HAProxy.  Similarly,
enabling it for one named virtual host enables it for all virtual hosts
on that port.  Thus, we only have to explicitly set it for one site, and
all the rest will use it as well.
2025-08-23 22:21:54 -05:00
5dbe26fc60 r/repohost: Optimize createrepo queue loop
Instead of waking every 30 seconds, the queue loop in
`repohost-createrepo.sh` now only wakes when it receives an inotify
event indicating the queue file has been modified.  To avoid missing
events that occured while a `createrepo` process was running, there's
now an inner loop that runs until the queue is completely empty, before
returning to blocking on `inotifywait`.
2025-08-20 07:11:27 -05:00
f8d58ef0ed websites/dcow: Transition to static site
We don't really use this site for screenshot sharing any more.  It's
cool to keep to look at old screenshots, so I've saved a static snapshot
of it that can be hosted by plain ol' Apache.
2025-08-16 08:55:28 -05:00
b72676a1bb nextcloud: Fetch HTTPS cert from Kubernetes
Since Nextcloud uses the _pyrocufflink.net_ wildcard certificate, we can
load it directly from the Kubernetes Secret, rather than from the file
in the _certs_ submodule, just like Gitea et al.
2025-08-11 10:39:54 -05:00
f5ab739c9e websites: dustinandtabitha: Switch to mod_md for cert
The _dustinandtabitha.com_ site now obtains its certificate from Let's
Encrypt using the Apache _mod_md_ (managed domain) module.  This
dramatically simplifies the deployment of this certificate, eliminating
the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to
_certs.git_, and Jenkins to push it out to the web server.
2025-08-11 10:34:30 -05:00
33da25209d r/lego: Fix timer unit trigger
`OnActiveSec` only fires once.  To trigger the renew periodically, we
need to use `OnCalendar`.
2025-08-10 17:45:46 -05:00
daa602495c r/frigate: Add udev rules for coral tpu
Since the _frigate.service_ unit depends on _dev-apex_0.device_,
`/dev/apex_0` needs to have the `systemd` "tag" on its udev device info.
Without this tag, systemd will not "see" the device and thus will not
mark the `.device` unit as active.
2025-08-06 09:04:04 -05:00
9b4232d01a Merge remote-tracking branch 'refs/remotes/origin/master' 2025-08-05 18:17:13 -05:00
0fe296f7f3 fluent-bit: Deploy log collector for Victoria Logs
[fluent-bit][0] is a generic, highly-configurable log collector.  It was
apparently initially developed for fluentd, but is has so many output
capabilities that it works wil many different log aggregation systems,
including Victoria Logs.

Although Victoria Logs supports the Loki input format, and therefore
_Promtail_ would work, I want to try to avoid depending on third-party
repositories.  _fluent-bit_ is packaged by Fedora, so there shouldn't be
any dependency issues, etc.

[0]: https://fluentbit.io
2025-08-05 07:14:08 -05:00
c35c7b8520 r/apache: log errors to syslog by default
Logging to syslog will allow messages to be aggregated in the central
server (Loki now, Victoria Logs eventually), so I don't have to SSH into
the web server to check for errors.
2025-08-04 09:49:19 -05:00
84a8a0d4af websites: dustin.hatch.n: Switch to mod_md for cert
The _dustin.hatch.name_ site now obtains its certificate from Let's
Encrypt using the Apache _mod_md_ (managed domain) module.  This
dramatically simplifies the deployment of this certificate, eliminating
the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to
_certs.git_, and Jenkins to push it out to the web server.
2025-08-04 09:49:19 -05:00
71b1363c58 r/vmhost: Install nmap-ncat
While clients can use `virt-ssh-helper` to communicate with `libvirtd`,
they need `nc` in order to forward SPICE graphics communication.
2025-07-31 10:19:11 -05:00
7f8e39ebd4 websites: chmod777.sh: Switch to mod_md for cert
The _chmod777.sh_ site now obtains its certificate from Let's
Encrypt using the Apache _mod_md_ (managed domain) module.  This
dramatically simplifies the deployment of this certificate, eliminating
the need for _cert-manager_ to obtain it, _cert-exporter_ to add it to
_certs.git_, and Jenkins to push it out to the web server.
2025-07-28 18:53:58 -05:00
3270011fee r/vmhost: Work around libvirt SELinux policy bug
With the transition to modular _libvirt_ daemons, the SELinux policy is
a bit more granular.  Unfortunately, the new policy has a funny [bug]: it
assumes directories named `storage` under `/run/libvirt` must be for
_virtstoraged_ and labels them as such, which prevents _virtnetworkd_
from managing a virtual network named `storage`.

To work around this, we need to give `/run/libvirt/network` a special
label so that its children do not match the file transition pattern for
_virtstoraged_ and thus keep their `virtnetworkd_var_run_t` label.

[bug]: https://bugzilla.redhat.com/show_bug.cgi?id=2362040
2025-07-28 18:23:24 -05:00
2ee86f6344 r/vmhost: Retry vm-autostart if libvirt is down
If the _libvirt_ daemon has not fully started by the time `vm-autostart`
runs, we want it to fail and try again shortly.  To allow this, we first
attempt to connect to the _libvirt_ socket, and if that fails, stop
immediately and try again in a second.  This way, the first few VMs
don't get skipped with the assumption that they're missing, just because
the daemon wasn't ready yet.
2025-07-28 18:20:50 -05:00
4df047cf76 r/vmhost: Disable DynamicUsers for vm-autostart
_libvirt_ has gone full Polkit, which doesn't work with systemd dynamic
users.  So, we have to run `vm-autostart` as root (with no special
OS-level privileges) in order for Polkit to authorize the connection to
the daemon socket.
2025-07-28 18:18:35 -05:00
59d17bf3f4 r/v-l: Use the host network
I don't know what the deal is, but restarting the _victoria-logs_
container makes it lose inbound network connectivity.  It appears that
the firewall rules that forward the ports to the container's namespace
seem to get lost, but I can't figure out why.  To fix it, I have to
flush the netfilter rules (`nft flush ruleset`) and then restart
_firewalld_ and _victoria-logs_ to recreate them.  This is rather
cumbersome, and since Victoria Logs runs on a dedicated VM, there's
really not much advantage to isolating the container's network.
2025-07-27 17:47:31 -05:00