Splitting up the SSH keys authorized for root login into separate
variables for SK versus legacy keys will allow more fine-grained control
of which set is used in certain situations. Specifically, the intent is
to allow non-Fedora operating systems to use the SK variants if
applicable, without having to repeat them explicitly.
Most hosts will not need to send any messages to ntfy. Let's define the
ntfy pipeline stages only for the machines that need them. There are
currently two use cases for ntfy:
* MD RAID status messages (from Chromie and nvr2)
* WAN Link status messages (from gw1)
Breaking up the pipeline into smaller pieces allows both of these use
cases to define their appropriate filters while still sharing the common
steps. The other machines that have no use for these steps now omit
them entirely.
Messages from sources other than the systemd journal do not have a
`hostname` field by default. This could make filtering logs difficult
if there are multiple servers that host the same application. Thus, we
need to inject the host name statically into every record, to ensure
they can be correctly traced to their source machine.
Instead of defining the common values for Fluent bit inputs, filters,
and outputs directly in the variables used by the _fluent-bit_ role, we
need to split these into reusable pieces. This way, hosts and groups
that need to use a slightly different pipeline configuration can access
the default values without having to redefine them.
For machines that have Linux MD RAID arrays, I want to receive
notifications about the status of the arrays immediately via _ntfy_. I
had this before with `journal2ntfy`, but I never got around to setting
it up for the current generation of machines (_nvr2_, _chromie_). Now
that we have `fluent-bit` deployed, we can use its pipeline capabilities
to select the subset of messages for which we want immediate alerts and
send them directly to _ntfy_. We use a Lua function to transform the
log record into a body compatible with _ntfy_'s JSON publish request;
`fluent-bit` doesn't have any other way to set array values, as needed
for the `tags` member.
[fluent-bit][0] is a generic, highly-configurable log collector. It was
apparently initially developed for fluentd, but is has so many output
capabilities that it works wil many different log aggregation systems,
including Victoria Logs.
Although Victoria Logs supports the Loki input format, and therefore
_Promtail_ would work, I want to try to avoid depending on third-party
repositories. _fluent-bit_ is packaged by Fedora, so there shouldn't be
any dependency issues, etc.
[0]: https://fluentbit.io
The `root_authorized_keys` variable was originally defined only for the
*pyrocufflink* group. This used to effectively be "all" machines, since
everything was a member of the AD domain. Now that we're moving away
from that deployment model, we still want to have the break-glass
option, so we need to define the authorized keys for the _all_ group.
In the spirit of replacing bloated tools with unnecessary functionality
with smaller, more focused alternatives, we can use `doas` instead of
`sudo`. Originally, it was a BSD tool, but the Linux port supports PAM,
so we can still use `pam_auth_ssh_agent` for ppasswordless
authentication.
The Samba AD domain performs two important functions: centralized user
identity mapping via LDAP, and centralized authentication via
Kerberos/GSSAPI. Unfortunately, Samba, on both domain controllers and
members, is quite frustrating. The client, _winbind_, frequently just
stops working and needs to have its cache flushed in order to resolve
user IDs again. It also takes quite a lot of memory, something rather
precious on Raspberry Pis. The DC is also somewhat flaky at times, and
cumbersome to upgrade. In short, I really would like to get rid of as
much of it as possible.
For most use cases, OIDC can replace Kereros. For SSH specifically, we
can use SSH certificates (which are issued to OIDC tokens).
Unfortunately, user and group accounts still need ID numbers assigned,
which is what _winbind_ does. In reality, there's only one user that's
necessary: _dustin_. It doesn't make sense to bring along all the
baggage of Samba just to map that one account. Instead, it's a lot
simpler and more robust to create it statically.
*dnf-automatic* is an add-on for `dnf` that performs scheduled,
automatic updates. It works pretty much how I would want it to:
triggered by a systemd timer, sends email reports upon completion, and
only reboots for kernel et al. updates.
In its default configuration, `dnf-automatic.timer` fires every day. I
want machines to update weekly, but I want them to update on different
days (so as to avoid issues if all the machines reboot at once). Thus,
the _dnf-automatic_ role uses a systemd unit extension to change the
schedule. The day-of-the-week is chosen pseudo-randomly based on the
host name of the managed system.
Promtail is the log sending client for Grafana Loki. For traditional
Linux systems, an RPM package is available from upstream, making
installation fairly simple. Configuration is stored in a YAML file, so
again, it's straightforward to configure via Ansible variables. Really,
the only interesting step is adding the _promtail_ user, which is
created by the RPM package, to the _systemd-journal_ group, so that
Promtail can read the systemd journal files.
The `TrustedUserCAKeys` setting for *sshd(8)* tells the server to accept
any certificates signed by keys listed in the specified file.
The authenticating username has to match one of the principals listed in
the certificate, of course.
This role is applied to all machines, via the `base.yml` playbook.
Certificates issued by the user CA managed by SSHCA will therefore be
trusted everywhere. This brings us one step closer to eliminating the
dependency on Active Directory/Samba.
The *ssh-host-certs* role, which is now applied as part of the
`base.yml` playbook and therefore applies to all managed nodes, is
responsible for installing the *sshca-cli* package and using it to
request signed SSH host certificates. The *sshca-cli-systemd*
sub-package includes systemd units that automate the process of
requesting and renewing host certificates. These units need to be
enabled and provided the URL of the SSHCA service. Additionally, the
SSH daemon needs to be configured to load the host certificates.
*dns1.pyrocufflink.blue* has been decommissioned. Having a second DNS
server never really worked correctly for some reason, and the
maintenance overhead of the Raspberry Pi is just not worth it right now.
The DHCP service has been moved to *dns0.pyrocufflink.blue*.