1220 Commits

Author SHA1 Message Date
5dba0aec8f fluent-bit: create configmap for kubernetes nodes
The last step in replacing Loki with Victoria Logs is to ingest logs
from Kubernetes pods.  Like Promtail, Fluent Bit is capable of
augmenting log records with Kubernetes metadata, so we can search for
logs by pod name, namespace, etc.  This of course requires access to the
Kubernetes API, and the easiest way to provide that is to run Fluent Bit
as a Kubernetes pod, granting its service account the appropriate
permissions.

Since Fluent Bit also collects logs from the systemd journal, I want to
make sure the configuration for that function stays the same on
Kubernetes nodes as on all other servers.  One way to do that would be
to run two different instances of Fluent Bit: one managed by Ansible
that collects journal messages, and another managed by Kubernetes that
collects pod logs.  This seems like unnecessary overhead, so I have
chosen a hybrid approach.  Ansible manages the configuration for the
process running in Kubernetes.
2025-12-04 21:26:03 -06:00
719be9a4e9 Deploy Radarr, Sonarr, Prowlarr on file0.p.b
I had originally intended to deploy Radarr, Sonarr, and Prowlarr on
Kubernetes.  Unfortunately, this turned out to be problematic, as I
would need a way to share the download directory between Radarr/Sonar
and Aria2, and the media directory between Radarr/Sonarr and Jellyfin.
The only way I could fathom to do this would be to expose both
directories via NFS and mount that share into the pods.  I decided this
would be too much of a hassle for no real gain, at least not in the
short term.  Instead, it makes more sense to deploy the *arr suite on
the same server as Aria2 and Jellyfin, which is essentially what the
community expects.

The recommended images for deploying the applications in containers are
pretty crappy. I didn't really want to mess with trying to get the
them to work natively on Fedora, nor deal with installing them from
tarballs with Ansible, so I created my own Debian-based container images
for them and deployed those via Podman+Quadlet.  These images are
published to the _Packages_ organization in Gitea, which is not public
and requires authentication.  We can use the Kubernetes Secret to obtain
the authentication token to use to pull the image.
2025-12-03 23:05:21 -06:00
f892570467 r/f-b-arr: Configure Fluent Bit for Servarr logs
The _fluent-bit-servarr_ role creates a configuration file for Fluent
Bit to read and parse logs from Radarr, Sonarr, and Prowlarr.  These
logs can then be sent to an output by defining the
`fluent_bit_servarr_outputs` variable.
2025-12-03 23:00:54 -06:00
23670338b3 sonarr: Deploy Sonarr in a Podman container
The `sonarr.yml` playbook and corresponding role deploy Sonarr, the
indexer manager for the *arr suite, in a Podman container.

Note that we're relocating the log files from the Sonarr AppData
directory to `/var/log/sonarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
9223dbe820 prowlarr: Deploy Prowlarr in a Podman container
The `prowlarr.yml` playbook and corresponding role deploy Prowlarr, the
indexer manager for the *arr suite, in a Podman container.

Note that we're relocating the log files from the Prowlarr AppData
directory to `/var/log/prowlarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
a41a3fa3d0 radarr: Deploy Radarr in a Podman container
The `radarr.yml` playbook and corresponding role deploy Radarr, the
movie library/download manager, in a Podman container.

Note that we're relocating the log files from the Radarr AppData
directory to `/var/log/radarr` so they can be picked up by Fluent Bit.
2025-12-03 23:00:54 -06:00
6ad76e4b33 r/fluent-bit: Support drop-in configuration files
Fluent Bit supports including configuration fragments from other files
using its `includes` option.  Adding a glob pattern to the default
configuration will allow other roles to supply additional configuration
by creating files in the `/etc/fluent-bit/include` directory.  This
makes composition of configuration significantly easier.

Unfortunately, `fluent-bit` has a quirk in that there must exist at
least one file matching the glob pattern, or it will fail to start.  To
work around this, we must supply an empty fragment.
2025-12-03 23:00:54 -06:00
cc288a4ee3 r/apache-base: Factor out handlers for reuse
Roles that need to reload or restart Apache after writing configuration
files do not necessarily need to depend on the _apache_ role, but may
assume Apache is deployed in some other way.  To support this, I have
factored out the handlers from the _apache_ role into an _apache-base_
role, which such roles can list as a dependency.
2025-12-03 23:00:54 -06:00
fd8cc42720 hosts: Move PiKVM to separate inventory
There's no reason for Jenkins to be messing with this machine.  It's too
different than the rest of the hosts it manages, so it's been quite
difficult getting it to work anyway.  Let's just move it to a separate
inventory file that we have to specify manually when we want to apply a
Playbook to it.
2025-12-02 08:52:22 -06:00
7eeacdecd7 pikvm: Add user for Prometheus metrics
PiKVM exports metrics in Prometheus format, but requires authentication
to scrape them.
2025-12-01 12:17:26 -06:00
e9d2d21ec3 hosts: Add pikvm-nvr2.m.p.b
This is a Raspberry Pi 2 with HDMI-CSI adapter and Raspberry Pi Pico,
connected to _nvr2.pyrocufflink.blue_, as the latter does not have a
serial console.
2025-12-01 10:03:05 -06:00
cce485db54 pikvm: Add role/playbook for PiKVM
PiKVM comes with its own custom Arch Linux-based operating systems.  We
want to be able to manage it with our configuration policy, especially
for setting up authentication, etc.  It won't really work with the
host-provisioner without some pretty significant changes to the base
playbooks, but we can control some bits directly.
2025-12-01 10:01:07 -06:00
4fc0e7bdec r/base: Conditionally install Python SELinux libs
We do not need to install the SELinux bindings for operating systems
that do not support SELinux.
2025-12-01 09:58:56 -06:00
1089927be3 all: Use vars for sk/non-sk SSH keys
Splitting up the SSH keys authorized for root login into separate
variables for SK versus legacy keys will allow more fine-grained control
of which set is used in certain situations.  Specifically, the intent is
to allow non-Fedora operating systems to use the SK variants if
applicable, without having to repeat them explicitly.
2025-12-01 09:56:34 -06:00
85fc29d511 remote-blackbox: Increase scrape timeout
In order to avoid false positives, especially with Invoice Ninja, I'm
increasing the timeout values for scraping the public-facing websites.
They can occasionally be quite slow, either because of our Internet
connection, or load on the servers.
2025-11-25 21:56:20 -06:00
0334b1b77a Merge branch 'fluent-bit' 2025-11-24 07:49:05 -06:00
f1b61a8d0a v-l: Enable useRemoteIP for syslog
Victoria Logs can now record the source address for syslog messages in a
`remoteIP` field.  This has to be enabled specifically, although I can't
think of a reason why someone would _not_ want to record that
information.
2025-11-24 07:47:35 -06:00
8aa1e986d4 r/gitea: Enable PROXY protocol
Using the PROXY protocol allows the publicly-facing reverse proxy to
pass through the original source address of the client, without doing
TLS termination.  Clients on the internal network will not go through
the proxy, though, so we have to disable the PROXY protocol for those
addresses.  Unfortunately, the syntax for this is kind of cumbersome,
because Apache only has a deny list, not an allow list, so we have to
enumerate all of the possible internal addresses _except_ the proxy.
2025-11-19 07:43:29 -06:00
25d813144c r/web/hlc: Drop cert role
The certificate for _hatchlearningcenter.org_ is managed by Apache
*mod_md* now.
2025-11-17 08:00:45 -06:00
68b045d6d1 websites: Drop unnecessary cert for hatch.chat
The Synapse server has been gone for a long time.
2025-11-17 07:56:38 -06:00
c1944fc78a site: Remove frigate PB
The `frigate` playbook cannot be applied by the host provisioner for
several reasons.  First, it needs manual intervention in order to enroll
the MOK which is used to sign the `gasket-driver` kernel modules.
Further, it needs several encrypted values from Ansible Vault, which are
not available to the _host-provisioner_.
2025-11-16 16:49:15 -06:00
2d53fe6acd gw1/squid: Allow pxe.p.b via HTTPS
Now that Kickstart files are hosted on _pxe.pyrocufflink.blue_, we can
allow access to that entire (sub-)domain, enabling clients to fetch the
files over HTTPS.  Previously, this was not possible because in order to
allow access to Kickstart files but nothing else on Gitea, we had to
rely on full URL matching.
2025-11-16 16:49:15 -06:00
2aca0429eb useproxy: Add ntfy.p.b to NO_PROXY
Specifically for _fluent-bit_, which does not correctly handle wildcards
or subdomains in `NO_PROXY`, to send real-time notifications from logs
via ntfy.
2025-11-16 16:49:15 -06:00
04f62a1467 hosts: Remove nvr2 from AD domain
The NVMe drive in _nvr2.pyrocufflink.blue_ died, so I had to re-install
Fedora on a new drive.  This time around, it will not be a domain
member, as with the other new servers added recently.
2025-11-16 16:48:20 -06:00
60b7a20e1f frigate: Switch to pre-compiled gasket-driver RPM
The DKMS package for the _gasket-driver_ kernel modules is something of
a problem.  For one thing, upstream seems to have abandoned the driver
itself, and it now requires several patches in order to compile for
current kernel versions.  These patches are not included in the DKMS
package, and thus have to be applied manually after installing it.  More
generally, I don't really like how DKMS works anyway.  Besides requiring
a full kernel development toolchain on a production system, it's
impossible to know if a module will compile successfully until _after_
the new kernel has been installed and booted.  This has frequently meant
that Frigate won't come up after an update because building the module
failed.  I would much rather have a notification about a compatibility
issue for an _upcoming_ update, rather than an applied one.

To rectify these issues, I have created a new RPM package tha contains
pre-built, signed kernel modules for the Coral EdgeTPU device.  Unlike
the DKMS package, this package needs to be rebuilt for every kernel
version, however, this is done by Jenkins before the updated kernel gets
installed on the machine.  It also expresses a dependency on an exact
kernel version, so the kernel cannot be updated until a corresponding
_gasket-driver_ package is available.
2025-11-16 16:30:51 -06:00
94a777fec8 r/collectd-sensors: Add missing handlers file 2025-11-16 16:30:51 -06:00
0df95c8378 Drop .certs submodule
Nothing uses these certificates anymore, and nothing manages/renews
them.  Everything has either been converted to ACME, or fetches the
_pyrocufflink.net_ wildcard certificate directly from the Kubernetes
Secret.
2025-11-16 16:28:49 -06:00
daa91e71a1 Merge remote-tracking branch 'refs/remotes/origin/master' 2025-11-16 16:24:04 -06:00
fce060bdec r/ssh-host-certs: Fix circular dep in reload.path
The `reload-ssh-cert.path` unit introduced a circular ordering
dependency with `sshd.service` by way of `paths.target`.  There's no
particular reason for this dependency here, so we need to remove it to
resolve the issue.
2025-11-13 18:40:52 -06:00
44c3dba46a r/gitea: Update to v1.24.7 2025-11-12 17:48:09 -06:00
4b91e088ea r/apache: Reduce amount of logs stored
There's really no reason to keep 4 256 MiB log files, especially access
logs.  In any case, most of the web servers only have 1 GiB log volume,
so this configuration tends to fill them up.
2025-11-09 13:23:02 -06:00
28ecc2974c fluent-bit: Remove Promtail 2025-11-06 09:44:22 -06:00
a500e0ece4 hosts: Decommission dc-headphone.p.b
_dc-headphone.pyrocufflink.blue_ has been replaced by
_dc-backless.pyrocufflink.blue_.
2025-11-01 22:28:43 -05:00
5af25bcccf r/dch-yum: Trust GPG key
We need to explicitly add the GPG signing key for the _dch_ repository
to the system trust store, otherwise, _dnf-automatic_ will fail, as it
cannot implicitly add new keys during an update.
2025-10-27 12:54:07 -05:00
1804bc06f0 domain-controller: Remove vault secrets
The secret values stored in this vault file were never actually used.
They weren't even correct.
2025-10-27 12:54:07 -05:00
7929176b4e create-dc: Update to use new provisioning process
Instead of running `virt-install` directly from the `create-dc.sh`
script, it now relies on `newvm.sh`.  This will ensure that VMs created
to be domain controllers will conform to the same expectations as all
other machines, such as using the libvirt domain metadata to build
dynamic inventory.

Similarly, the `create-dc.yml` playbook now imports the `host-setup.yml`
playbook, which covers the basic setup of a new machine.  Again, this
ensures that the same policy is applied to DCs as to other machines.

Finally, domain controller machines now no longer use _winbind_ for
OS user accounts and authentication.  This never worked particularly
well on DCs anyway (particularly because of the way _winbind_ insists on
using domain-prefixed user accounts when it runs on a DC), and is now
worse with recent Fedora changes.  Instead, DCs now have local users who
authenticate via SSH certificates, the same as other current-generaton
servers.
2025-10-27 12:53:27 -05:00
3f761eacb4 newvm: Add support for specifying static IP config
Although rare, there are scenarios where we may want to deploy a new
virtual machine with a static, manually-configured IP address.
Anaconda/Dracut support this via the `ip=` kernel command-line argument.
To simplify populating that argument, the `newvm` script now takes
additional command-line arguments for IP address (in CIDR prefix),
default gateway, and name server address(es) and creates the appropriate
string from these discrete values.
2025-10-24 11:17:11 -05:00
3bed59055c users: Do not apply sudo role on Samba DCs
Users, auth, etc. for domain controllers will be handled by the
`create-dc.yml` playbook.  I haven't decided exactly how this playbook
will get applied, I want to make sure the host provisioner is able to
successfully provision machines in the _samba-dc_ group nonetheless.
2025-10-22 21:13:03 -05:00
7308b45047 fluent-bit: Enable EPEL repo if needed
The _fluent-bit_ package is provided by EPEL for Red Hat/CentOS/AlmaLinux.
2025-10-19 09:28:47 -05:00
0b914d617e ci: Optionally allow installing packages
Usually, we do not want the continuous enforcement jobs installing or
upgrading software packages.  Sometimes, though, we may want to use a
Jenkins job to roll out something new, so this new `ALLOW_INSTALL`
parameter will control whether or not Ansible tasks tagged with
`install` are skipped.
2025-10-19 09:04:27 -05:00
ea1253c9b8 ci: Remove remount RO/RW stages
None of the extant servers have read-only root filesystems any more, so
these stages are no longer necessary.
2025-10-19 08:57:19 -05:00
bcfe7cc699 ci: Add pipeline for fluent-bit Playbook 2025-10-17 07:53:10 -05:00
dc8961de92 fluent-bit: Do not apply to K8s nodes
We'll manage Fluent-Bit on Kubernetes nodes as a DaemonSet.  This will
be necessary in order to grant it access to the Kubernetes API so it can
augment log records with Kubernetes metadata (labels, pod name, etc.).
2025-10-17 07:51:32 -05:00
96ac5be3b5 r/kubelet: Schedule automatic image prune
As pods move around between nodes, applications are updated, etc., nodes
tend to accumulate images in their container stores that are no longer
used.  These take up space unnecessarily, eventually triggering disk
usage alarms.  From now, the _kubelet_ role installs a systemd timer and
service unit to periodically clean up these unused images.
2025-10-13 09:54:20 -05:00
142682ce2f r/ssh-host-certs: Fix restart handler
The _ssh-host-certs.target_ unit does not exist any more.  It was
provided by the _sshca-cli-systemd_ package to allow machines to
automatically request their SSH host certificates on first boot.  It had
a `ConditionFirstBoot=` requirement, which made it not work at any other
time, so there was no reason to move it into the Ansible configuration
policy.  Instead, we can use the _ssh-host-certs-renew.target_ unit to
trigger requesting or renewing host certificates.
2025-09-17 06:40:20 -05:00
4601b4d092 victoria-logs: Update to v1.33.1 2025-09-15 11:13:01 -05:00
c2d26f1f59 r/fluent-bit: Drop network.target requirement
The _network.target_ unit should be used for ordering only.  Listing it
as a `Requires=` dependency can cause _fluent-bit.service_ to fail to
start at all if the network takes slightly too long to initialize at
boot.
2025-09-15 10:49:32 -05:00
2cba5eb2e4 fluent-bit: Make ntfy pipeline steps optional
Most hosts will not need to send any messages to ntfy.  Let's define the
ntfy pipeline stages only for the machines that need them.  There are
currently two use cases for ntfy:

* MD RAID status messages (from Chromie and nvr2)
* WAN Link status messages (from gw1)

Breaking up the pipeline into smaller pieces allows both of these use
cases to define their appropriate filters while still sharing the common
steps.  The other machines that have no use for these steps now omit
them entirely.
2025-09-15 10:46:45 -05:00
faf4822918 fluent-bit: Ignore all HTTP output status messages
If the Fluent Bit pipeline includes multiple HTTP outputs, we need to
supporess the `HTTP status=200` messages from _all_ of them.
2025-09-15 08:01:42 -05:00
3d4bf3dd6c fluent-bit: Add hostname field to all records
Messages from sources other than the systemd journal do not have a
`hostname` field by default.  This could make filtering logs difficult
if there are multiple servers that host the same application.  Thus, we
need to inject the host name statically into every record, to ensure
they can be correctly traced to their source machine.
2025-09-15 08:00:16 -05:00