_haproxy0.pyrocufflink.blue_ is a Fedora Linux VM that runs HAProxy to
provide reverse proxy, exposing web sites and applications to the
Internet. It has a static MAC address because it will need a static IP
address, at least initially, in order for DNAT to work.
Originally, the VM hosts were in a separate inventory so they would
not be managed with the rest of the servers. It used to be that one
server was running all the VMs, while the other was asleep. That's
no longer the case; both alre always running and each has about half
of the VMs. Since they're both always online, they can be managed
normally now.
*nvr2.pyrocufflink.blue* originally ran Fedora CoreOS. Since I'm tired
of the tedium and difficulty involved in making configuration changes to
FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.
The *useproxy* role configures the `http_proxy` et al. environmet
variables for systemd services and interactive shells. Additionally, it
configures Yum repositories to use a single mirror via the `baseurl`
setting, rather than a list of mirrors via `metalink`, since the proxy
a) the proxy only allows access to _dl.fedoraproject.org_ and b) the
proxy caches RPM files, but this is only effective if all clients use
the same mirror all the time.
The `useproxy.yml` playbook applies this role to servers in the
*needproxy* group.
*db0.pyrocufflink.blue* will be the primary server in the new PostgreSQL
database cluster. We're starting with Fedora 39 so we can have
PostgreSQL 15, to match the version managed by the Postgres Operator in
the Kubernetes cluster right now.
I am going to use the *postgresql* group for the dedicated database
servers. The configuration for those machines will be quite a bit
different than for the one existing machine that is a member of that
group already: the Nextcloud server. Rather than undefine/override all
the group-level settings at the host level, I have removed the Nextcloud
server from the *postgresql* group, and updated the `nextcloud.yml`
playbook to apply the *postgresql-server* role itself.
Eventually, I want to move the Nextcloud database to the central
database servers. At that point, I will remove the *postgresql-server*
role from the `nextcloud.yml` playbook.
*k8s-amd64-n0.pyrocufflink.blue*, *k8s-amd64-n1.pyrocufflink.blue*, and
*k8s-amd64-n2.pyrocufflink.blue*, which ran Fedora Linux, have been
replaced by *k8s-amd64-n4.pyrocufflink.blue*,
*k8s-amd64-n5.pyrocufflink.blue*, and *k8s-amd64-n6.pyrocufflink.blue*,
respectively. The new machines run Fedora CoreOS, and are thus not
managed by the Ansible configuration policy.
New AD DC servers run Fedora 40. Their LDAP server certificates are
issued by *step-ca* via ACME, signed by *dch-ca r2*.
I've changed the naming convention for domain controllers again. I
found the random sequenc of characters to be too difficult to remember
and identify. Using a short random word (chosen from the EFF word list
used by Diceware) should be a lot nicer. These names are chosen by the
`create-dc.sh` script.
The internal DNS server for the *pyrocufflink.blue* et al. domains runs
on the firewall now, and is thus no longer managed by Ansible. Dropping
the group variables so the file encrypted with Ansible Vault can go
away.
*serial0.pyrocufflink.blue* has been replaced by
*serial1.pyrocufflink.blue*. The latter runs Fedora CoreOS and is
managed by the CUE-based configuration policy in *cfg.git*.
The Metrics Pi has bit the dust. The NVMe disk has never been
particularly reliable, but now it's gotten to the point where it's a
real issue. The Pi needs rebooted at least once a day.
I've moved the Victoria Metrics/Grafana ecosystem to Kubernetes.
So it turns out Gitea's RPM package repository feature is less than
stellar. Since each organization/user can only have a single
repository, separating packages by OS would be extremely cumbersome.
Presumably, the feature was designed for projects that only build a
single PRM for each version, but most of my packages need multiple
builds, as they tend to link to system libraries. Further, only the
repository owner can publish to user-scoped repositories, so e.g.
Jenkins cannot publish anything to a repository under my *dustin*
account. This means I would ultimately have to create an Organization
for every OS/version I need to support, and make Jenkins a member of it.
That sounds tedious and annoying, so I decided against using that
feature for internal packages.
Instead, I decided to return to the old ways, publishing packages with
`rsync` and serving them with Apache. It's fairly straightforward to
set this up: just need a directory with the appropriate permissions for
users to upload packages, and configure Apache to serve from it.
One advantage Gitea's feature had over a plain directory is its
automatic management of repository metadata. Publishers only have to
upload the RPMs they want to serve, and Gitea handles generating the
index, database, etc. files necessary to make the packages available to
Yum/dnf. With a plain file host, the publisher would need to use
`createrepo` to generate the repository metadata and upload that as
well. For repositories with multiple packages, the publisher would need
a copy of every RPM file locally in order for them to be included in the
repository metadata. This, too, seems like it would be too much trouble
to be tenable, so I created a simple automatic metadata manager for the
file-based repo host. Using `inotifywatch`, the `repohost-createrepo`
script watches for file modifications in the repository base directory.
Whenever a file is added or changed, the directory containing it is
added to a queue. Every thirty seconds, the queue is processed; for
each unique directory in the queue, repository metadata are generated.
This implementation combines the flexibility of a plain file host,
supporting an effectively unlimited number of repositories with
fully-configurable permissions, and the ease of publishing of a simple
file upload.
*nvr1.pyrocufflink.blue* has been migrated to Fedora CoreOS. As such,
it is no longer managed by Ansible; its configuration is done via
Butane/Ignition. It is no longer a member of the Active Directory
domain, but it does still run *collectd* and export Prometheus metrics.
Jellyfin is a multimedia library manager. Clients can browse and stream
music, movies, and TV shows from the server and play them locally
(including in the browser).
Since Ubiquiti only publishes Debian packages for the Unifi Network
controller software, running it on Fedora has historically been neigh
impossible. Fortunately, a modern solution is available: containers.
The *linuxserver.io* project publishes a container image for the
controller software, making it fairly easy to deploy on any host with an
OCI runtime. I briefly considered creating my own image, since theirs
must be run as root, but I decided the maintenance burden would not be
worth it. Using Podman's user namespace functionality, I was able to
work around this requirement anyway.
The Frigate server has a RAID array that it uses to store video
recordings. Since there have been a few occasions where the array has
suddenly stopped functioning, probably because of the cheap SATA
controller, it will be nice to get an alert as soon as the kernel
detects the problem, so as to minimize data loss.
Most of the Synapse server's state is in its SQLite database. It also
has a `media_store` directory that needs to be backed up, though.
In order to back up the SQLite database while the server is running, the
database must be in "WAL mode." By default, Synapse leaves the database
in the default "rollback journal mode," which disallows multiple
processes from accessing the database, even for read-only operations.
To change the journal mode:
```sh
sudo systemctl stop synapse
sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;'
sudo systemctl start synapse
```
The `journal2ntfy.py` script follows the systemd journal by spawning
`journalctl` as a child process and reading from its standard output
stream. Any command-line arguments passed to `journal2ntfy` are passed
to `journalctl`, which allows the caller to specify message filters.
For any matching journal message, `journal2ntfy` sends a message via
the *ntfy* web service.
For the BURP server, we're going to use `journal2ntfy` to generate
alerts about the RAID array. When I reconnect the disk that was in the
fireproof safe, the kernel will log a message from the *md* subsystem
indicating that the resynchronization process has begun. Then, when
the disks are again in sync, it will log another message, which will
let me know it is safe to archive the other disk.
We're going to run MinIO on the BURP server to provide a backup target
for the [Postgres Operator][0]/[WAL-E][1]. Although the Postgres
Operator also supports backups via [WAL-G][2], which supports more
backup targets like SFTP, the operator does not support restoring from
those targets. As such, the best way to get fully-featured backups for
the Postgres Operator, including environment cloning, etc., is to use
S3. Since I absolutely do not want to store my backups "in the cloud,"
using MinIO seems a decent alternative. Running it on the BURP server
allows the backups to be stored and rotated along with regular system
backups.
[0]: https://github.com/zalando/postgres-operator/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
I moved the metrics Pi from the red network to the blue network. I
started to get uncormfortable with the firewall changes that were
required to host a service on the red network. I think it makes the
most sense to define the red network as egress only.
I've moved handling of DNS to the border firewall instead of a dedicated
virtual machine. Originally, the VM was necessary because the UniFi
Security Gateway sucked and could not (easily) handle the complex
configuration I wanted to use. Since moving to the new firewall, this
is no longer a problem.
Having DNS on a VM is problematic when full-network outages occur, like
the one that happened on 16 August 2022. When everything starts back
up, DNS is unavailable. libvirt VM autostart does not work for machines
that have been migrated between hosts (the auto-start flag is not
migrated, and libvirt "forgets" that the VM was supposed to autostart if
it is migrated away and back). I plan to script a solution for this at
some point, but I still think it makes more sense for the firewall to
handle it. It will certainly make it come up quicker regardless.
The *pxe* role configures the TFTP and NBD stages of PXE network
booting. The TFTP server provides the files used for the boot stage,
which may either be a kernel and initramfs, or another bootloader like
SYSLINUX/PXELINUX or GRUB. The NBD server provides the root filesystem,
typically mounted by code in early userspace/initramfs.
The *pxe* role also creates a user group called *pxeadmins*. Users in
this group can publish content via TFTP; they have write-access to the
`/var/lib/tftpboot` directory.
*mtrcs0.pyrocufflink.red* is a Raspberry Pi CM4 on a Waveshare
CM4-IO-BASE-B carrier board with a NVMe SSD. It runs a custom OS built
using Buildroot, and is not a member of the *pyrocufflink.blue* AD
domain.
*mtrcs0.p.r* hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager,
and Grafana. I've created a unique group and playbook for it,
*metricspi*, to manage all these applications together.