The *victoria-metrics* role deploys a single-server instance of the
Victoria Metrics time series database server. It installs the selected
version by downloading the binary release from Github and copying it to
`/usr/local/sbin` on the managed node. Scrape configuration is optional
and can be specified with the `scrape_configs` variable.
Tasks that configure the SELinux policy obviously only make sense if the
host uses SELinux. Similarly, if the host does not use FirewallD,
configuring firewall rules doesn't work.
Although the `collectd-version` script is fairly generic and *should*
work for most Linux distributions, it cannot be installed on machines
that a have an immutable root filesystem, e.g. Buildroot-based systems.
For Buildroot-based systems in particular, tracking the OS version makes
very little sense anyway. If we do end up with hosts running an OS
besides either Fedora or Buildroot, we can re-evaluate how to deploy
this feature.
The `/etc/collectd.d` directory is created by the RPM package on
machines running a Red Hat-based Linux distribution, but it may not
always be present on other machines.
In addition to ignoring particular types of filesystems, e.g. OverlayFS,
we can also ignore filesystems by their mount point. This could be
useful, for example, for bind-mounted directories, such as those used on
Kubernetes nodes.
By default, the *df* pluggin for collectd, which monitors filesystem
usage, collects data about all mounted filesystems. It can be
configured to ignore some filesystems, either by mount point, device, or
filesystem type. We will uses this capability to avoid collecting data
about OverlayFS mounts, because by definition, they do not represent a
real filesystem, but one or more other mounted filesystems. Collecting
data about these just creates useless metrics, especially on machines
that run containers.
Setting the `remount_state` variable to `rw` by default will allow the
`remount.yml` playbook to be "chained" with other playbooks, e.g.:
```
ansible-playbook -l kubelet remount.yml collectd.yml -b
```
Some machines, such as the nodes in the Kubernetes cluster, do not use
*firewalld*. For these machines, we need to skip the `firewalld` tasks,
as they will fail. The `host_uses_firewalld` variable can be set to
`False` for these machines to do so.
There is no specific playbook or role for Kubernetes. All OS
configuration is done at install time via kickstart scripts, and
deploying Kubernetes itself is done (manually) using `kubeadm init` and
`kubeadm join`.
It seems with each new release of Fedora, some feature or other of
*collectd* gets broken. In Feodra 36, the *interfaces* plugin does not
seem to work reliably, and the *md* plugin logs a *lot* of errors.
While these issues are investigated upstream, we either need to manage
our own policy for collectd or mark the `collectd_t` domain permissive.
I chose the latter because I'm lazy and I don't consider collectd to be
that big of a threat to security.
*nvr1.pyrocufflink.blue* is the new video recording server. It is a
1U rack-mounted physical machine based on the [Jetway
JBC150F596-3160-B][0] barebone system. It replaces
*nvr0.pyrocufflink.blue* in this role.
[0]: https://www.jetwaycomputer.com/JBC150F596.html
Podman 4 puts lock files in the configuration directory for [some stupid
reason][0]. There are so many issues here!
* It is now impossible to run `podman` as root with a read-only `/etc`.
* Why does it need the lock file at all when using `--network=host`?
Luckily, we can work around it fairly easily by mounting a tmpfs
filesystem over the directory it wants to put the lock file in. This
pretty much defeats the purpose of having a lock file, but it's likely
not needed anyway.
[0]: 836fa4c493
The *sensors* plugin for collectd reads temperature information from the
I²C/SMBus using *lm_sensors*. Naturally, it is only useful on physical
machines, so it is not installed or enabled by default.
Instead of a simple list of disabled plugins, hosts and host groups can
now control whether plugins are enabled or disabled using the
`collectd_plugins` map. The map keys are plugin names, and the values
are booleans indicating if the plugin is enabled.
Using this mechanism, some plugins can be disabled by default (e.g. the
*md* plugin), and enabling them per host or per host group is simpler.
Adding a second camera to the back yard, on the North side of the porch,
to try and figure out how the possums keep getting under the porch even
with the chicken wire around it!
We're trying to discover how the possums are getting into and out of the
house. Let's enable continuous video recording from the back yard
camera so we can observe them and come up with a plan to get rid of
them.
Cached facts interfere with the detection of which filesystems need to
be remounted. We need to clear them all and gather again before
beginning to ensure that the correct mounts are considered.
Mosquitto can save retained messages, persistent clients, etc. to the
filesystem and restore them at startup. This allows state to be
maintained even after the process restarts.
The KDC service, as managed by Samba, continuously logs to two files
that need to be rotated. The upstream configuration for logrotate only
manages one of these files, and does not correctly signal the service
after rotating, as it expects the service to be managed by systemd
instead of Samba. As such, we need to adjust the configuration to
handle both files and send SIGHUP directly to the process.
Promoting the new site I have been working on at *dustin.hatch.is* to my
main domain, *dustin.hatch.name*. The new site is just static content,
generated and uploaded by a Jenkins job.
Finally have a certificate for *dustin.hatch.name* now, too!
This resolves two issues with fetching the Proton VPNserver list:
1. If a connection error occurs when fetching the list, it will be
ignored, just as with HTTP errors
2. If any errors are encountered when fetching the list, and a valid
cache was loaded, its contents are returned, regardless of the
timestamp of the cache file.
To handle the RSVP form on *dustinandtabitha.com*, we are going to use
*formsubmit*. It runs on the same machine that hosts the website, so
there's no dealing with CORS. The */submit/rsvp* path, which is proxied
to the backend, is the RSVP form's target.
*formsubmit* is a simple, customizable HTML for submission handler. I
designed it for Tabitha to use to collect information from forms on her
websites. Notably, we will use it for the RSVP page on our wedding
invitation site.
The state history database is entirely too big. It takes over an hour
to create a backup of it, which usually causes BURP to time out. The
data it stores isn't particularly interesting anyway. Instead of trying
to back it up and ultimately not getting any backup at all, we'll just
skip it altogether to ensure we have a consistent backup of everything
else that is actually important.
The intermediate CA certificate was not included in the certificate file
used by Samba, so LDAP/TLS connections would fail with a trust
validation error.
Uploading large files can take a very long time. If the process takes
longer than the configured timeout in Apache, it will be aborted and the
client will receive an HTTP 504 Gateway Timeout error. Increasing the
timeout will help alleviate this for files up to a certain size.
Notably, it now lets me upload Signal backups without errors.
If Nextcloud does not have the Internet-facing reverse proxy listed in
its "trusted proxies" setting, it will mark all traffic as being from
the proxy itself. This breaks brute force detection, etc.
Nextcloud thinks it needs to run the upgrade/migration tool if the
version number in its configuration file does not match the running
version. It then updates the config file with the correct version. The
next time the configuration policy is applied, however, the version will
revert back to whatever is set in the template. This will re-trigger
the upgrade notification.
To avoid this problem, we now set the version in the configuration file
dynamically. Nextcloud writes its version number in a constant in
`version.php`.
Nextcloud uses double backslashes in its fully-qualified path names.
Although single backslashes work, the application will replace them,
leading to a constant conflict between itself and the Ansible template.
The first time launching a container after pulling a new image, it can
take several minutes for the container to actually start. Podman has to
set up the overlay filesystems, which is very slow on a Raspberry Pi.
With the default start timeout, systemd may end up killing the process
before the container is completely set up. Thus, we need to increase
the timeout to ensure there is plenty of time for Podman to work.
Processes running in containers only have access to a limited set of
devices, based on their SELinux type label. The USB serial devices
exposed by the Zwave and Zigbee adapters are not labelled correctly by
default to allow them to be used in containers.
Using `chcon` to change the type label of the device before starting the
container seems to work, but seems a bit kludgy. It would probably be
better to use a SELinux file context rule and/or a udev rule to ensure
the label is set correctly when the device node is created.