The [postgres exporter][0] exposes metrics about the operation and
performance of a PostgreSQL server. It's currently deployed on
_db0.pyrocufflink.blue_, the primary server of the main PostgreSQL
cluster.
[0]: https://github.com/prometheus-community/postgres_exporter
Home Assistant uses PostgreSQL for recording the history of entity
states. Since we had been using the in-cluster database server for
this, the data were migrated to the new external PostgreSQL server
automatically when the backup from the former was restored on the
latter. It follows, then, that we can point Home Assistant to the
new server as well.
Home Assistant uses SQLAlchemy, which in turn uses _libpq_ via
_psycopg_, as a client for PostgreSQL. It doesn't expose any
configuration parameters beyond the "database URL" directly, but we
can use the standard environment variables to specify the certificate
and private key for authentication. In fact, the empty `postgresql://`
URL is sufficient, and indicates that _all_ of the connection parameters
should be taken from environment variables. This makes specifying the
parameters for both the `wait-for-db` init container and the main
container take the exact same environment variables, so we can use
YAML anchors to share their definitions.
Since the new database server outside the Kubernetes cluster, created
for Authelia, was seeded from a backup of the in-cluster server, it
already contained the data from Firefly-III as well. Thus, we can
switch Firefly-III to using it, too.
The documentation for Firefly-III does not mention anything about how
to configure it to use certificate-based authentication for PostgreSQL,
as is required by the new server. Fortunately, it ultimately uses
_libpq_, so the standard `PG...` environment variables work fine. We
just need a certificate issued by the _postgresql-ca_ ClusterIssuer and
the _DCH Root CA_ certificate mounted in the Firefly-III container.
If there is an issue with the in-cluster database server, accessing the
Kubernetes API becomes impossible by normal means. This is because the
Kubernetes API uses Authelia for authentication and authorization, and
Authelia relies on the in-cluster database server. To solve this
chicken-and-egg scenario, I've set up a dedicated PostgreSQL database
server on a virtual machine, totally external to the Kubernetes cluster.
With this commit, I have changed the Authelia configuration to point at
this new database server. The contents of the new database server were
restored from a backup from the in-cluster server, so of Authelia's
state was migrated automatically. Thus, updating the configuration is
all that is necessary to switch to using it.
The new server uses certificate-based authentication. In order for
Authelia to access it, it needs a certificate issued by the
_postgresql-ca_ ClusterIssuer, managed by _cert-manager_. Although the
environment variables for pointing to the certificate and private key
are not listed explicitly in the Authelia documentation, their names
can be inferred from the configuration document schema and work as
expected.
All the Kubernetes nodes (except *k8s-ctrl0*) are now running Fedora
CoreOS. We can therefore use the Kubernetes API to discover scrape
targets for the Zincati job.
I've created a _Pool Time_ calendar in Nextcloud that we can use to
mark when people are expected to be in the pool. Using this, we can
configure the "someone is in the pool" alert not to fire during times
when we know people will be in the pool. This will make it much less
annoying on HLC pool days.
One of the reasons for moving to 4 `vmstorage` replicas was to ensure
that the load was spread evenly between the physical VM host machines.
To ensure that is the case as much as possible, we need to keep one
pod per Kubernetes node.
Longhorn does not work well for very large volumes. It takes ages to
synchronize/rebuild them when migrating between nodes, which happens
all too frequently. This consumes a lot of resources, which impacts
the operation of the rest of the cluster, and can cause a cascading
failure in some circumstances.
Now that the cluster is set up to be able to mount storage directly from
the Synology, it makes sense to move the Victoria Metrics data there as
well. Similar to how I did this with Jenkins, I created
PersistentVolume resources that map to iSCSI volumes, and patched the
PersistentVolumeClaims (or rather the template for them defined by the
StatefulSet) to use these. Each `vmstorage` pod then gets an iSCSI
LUN, bypassing both Longhorn and QEMU to write directly to the NAS.
The migration process was relatively straightforwrad. I started by
scaling down the `vminsert` Deployment so the `vmagent` pods would
queue the metrics they had collected while the storage layer was down.
Next, I created a [native][0] export of all the time series in the
database. Then, I deleted the `vmstorage` StatefulSet and its
associated PVCs. Finally, I applied the updated configuration,
including the new PVs and patched PVCs, and brought the `vminsert`
pods back online. Once everything was up and running, I re-imported
the exported data.
[0]: https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-export-data-in-native-format
Since all the nodes in the cluster run Fedora CoreOS now, we can
deploy collectd as a container, managed by a DaemonSet.
Note that while _collectd_ has to run as _root_ in order to collect
a lot of metrics, it should not run with all privileges. It does need
to run as a "super-privileged container" (`spc_t` SELinux domain), but
it does _not_ need most kernel capabilities.
By default, Kubernetes waits for each pod in a StatefulSet to become
"ready" before starting the next one. If there is a problem starting
that pod, e.g. data corruption, then the others will never start. This
sort of defeats the purpose of having multiple replicas. Fortunately,
we can configure the pod management policy to start all the pods at
once, regardless of the status of any individual pod. This way, if
there is a problem with the first pod, the others will still come up
and serve whatever data they have.
The [restic-exporter][0] exposes metrics about Restic snapshots as
Prometheus metrics. This allows us to get similar data as we have for
BURP backups. Chiefly important among the metrics are last backup time
and size, which we can use to determine if backups are working
correctly.
[0]: https://github.com/ngosang/restic-exporter
The digital photo frame in the kitchen is powered by a server service,
which exposes a minimal HTTP API. Using this API, we can e.g. advance
or backtrack the displayed photo. Exposing `rest_command` services
for these operations allows us to add buttons to dashboards to control
the frame.
We don't need to explicitly specify every single host individually.
Domain controllers, for example, are registered in DNS with SRV records.
Kubernetes nodes, of course, can be discovered using the Kubernetes API.
Both of these classes of nodes change frequently, so discovering them
dynamically is convenient.
Instead of routing iSCSI traffic from the Kubernetes network, through
the firewall, to the storage network, nodes now have a second network
adapter connected to directly to the storage network. The nodes with
such an adapter are labelled `network.du5t1n.me/storage`, so we can pin
the Jenkins PersistentVolume to them via a node affinity rule.
Using a volume claim template to define the persistent volume claim for
the Redis pod has two advantages: first, it enables using clustered
Redis, if we decide that becomes necessary, and second, it makes
deleteing and recreating the volume easier in the case of data
corruption. Simply scale down the StatefulSet to 0, delete the PVC, and
scale the StatefulSet back up.
Using a volume claim template to define the persistent volume claim for
the Redis pod has two advantages: first, it enables using clustered
Redis, if we decide that becomes necessary, and second, it makes
deleteing and recreating the volume easier in the case of data
corruption. Simply scale down the StatefulSet to 0, delete the PVC, and
scale the StatefulSet back up.
By default, step-ca issues certificates that are valid for only one day.
This means that clients need to have multiple renew attempts scheduled
throughout the day, otherwise, missing one could mean having their
certificates expire. This is unnecessary, and not even possible in all
cases, so let's make the default validity period longer and avoid the
issue.
Since I added an IPv6 ULA prefix to the "main" VLAN (to allow
communicating with the Synology directly), the domain controllers now
have AAAA records. This causes the `sambadc` screpe job to fail because
Blackbox Exporter prefers IPv6 by default, but Kubernetes pods do not
have IPv6 addreses.
Managing the Jenkins volume with Longhorn has become increasingly
problematic. Because of its large size, whenever Longhorn needs to
rebuild/replicate it (which happens often for no apparent reason), it
can take several hours. While the synchronization is happening, the
entire cluster suffers from degraded performance.
Instead of using Longhorn, I've decided to try storing the data directly
on the Synology NAS and expose it to Kubernetes via iSCSI. The Synology
offers many of the same features as Longhorn, including
snapshots/rollbacks and backups. Using the NAS allows the volume to be
available to any Kubernetes node, without keeping multiple copies of
the data.
In order to expose the iSCSI service on the NAS to the Kubernetes nodes,
I had to make the storage VLAN routable. I kept it as IPv6-only,
though, as an extra precaution against unauthorized access. The
firewall only allows nodes on the Kubernetes network to access the NAS
via iSCSI.
I originally tried proxying the iSCSI connection via the VM hosts,
however, this failed because of how iSCSI target discovery works. The
provided "target host" is really only used to identify available LUNs;
follow-up communication is done with the IP address returned by the
discovery process. Since the NAS would return its IP address, which
differed from the proxy address, the connection would fail. Thus, I
resorted to reconfiguring the storage network and connecting directly
to the NAS.
To migrate the contents of the volume, I temporarily created a PVC with
a different name and bound it to the iSCSI PersistentVolume. Using a
pod with both the original PVC and the new PVC mounted, I used `rsync`
to copy the data. Once the copy completed, I deleted the Pod and both
PVCs, then created a new PVC with the original name (i.e. `jenkins`),
bound to the iSCSI PV. While doing this, Longhorn, for some reason,
kept re-creating the PVC whenever I would delete it, no matter how I
requested the deletion. Deleting the PV, the PVC, or the Volume, using
either the Kubernetes API or the Longhorn UI, they would all get
recreated almost immediately. Fortunately, there was actually enough of
a delay after deleting it before Longhorn would recreate it that I was
able to create the new PVC manually. Once I did that, Longhorn seemed
to give up.
Kitchen v0.5 a few changes that affect the deployment:
* The Bored Board is now backed by MQTT
* The pool temperature is now displayed in the weather pane
* The container image is now based on Fedora and includes its own time
zone database and root CA bundle
* The websocket server prevents the process from stopping correctly
unless the graceful shutdown feature of `uvicorn` is disabled
[fleetlock] is an implementation of the Zincati FleetLock reboot
coordination protocol. It only works for machines that are Kubernetes
nodes, but it does enable safe rolling updates for those machines.
Specifically, when a node acquires a lock (backed by a Kubernetes
Lease), it cordons that node and evicts pods from it. After the node
has rebooted into the new version of Fedora CoreOS, it uncordons the
node and releases the lock.
[fleetlock]: https://github.com/poseidon/fleetlock
Vaultwarden has started prompting for the master password occasionally
when syncing the vault. Thus, we need to make sure it is available in
the _sync_ container, by mounting the secret and providing the
`PINENTRY_PASSWORD_FILE` environment variable.
Just having the alert name and group name in the ntfy notification is
not enough to really indicate what the problem is, as some alerts can
generate notifications for many reasons. In the email notifications
AlertManager sends by default, the values (but not the keys) of all
labels are included in the subject, so we will reproduce that here.
I don't like having alerts sent by e-mail. Since I don't get e-mail
notifications on my watch, I often do not see alerts for quite some
time. They are also much harder to read in an e-mail client (Fastmail
web an K-9 Mail both display them poorly). I would much rather have
them delivered via _ntfy_, just like all the rest of the ephemeral
notifications I receive.
Fortunately, it is easy enough to integrate Alertmanager and _ntfy_
using the webhook notifier in Alertmanager. Since _ntfy_ does not
natively support the Alertmanager webhook API, though, a bridge is
necessary to translate from one data format to the other. There are a
few options for this bridge, but I chose
[alexbakker/alertmanager-ntfy][0] because it looked the most complete
while also having the simplest configuration format. Sadly, it does not
expose any Prometheus metrics itself, and since it's deployed in the
_victoria-metrics_ namespace, it needs to be explicitly excluded from
the VMAgent scrape configuration.
[0]: https://github.com/alexbakker/alertmanager-ntfy
Although most libraries support ED25519 signatures for X.509
certificates, Firefox does not. This means that any certificate signed
by DCH CA R3 cannot be verified by the browser and thus will always
present a certificate error.
I want to migrate internal services that do not need certificates
that are trusted by default (i.e. they are only accessed programatically
or only I use them in the browser) back to using an internal CA instead
of the public *pyrocufflink.net* wildcard certificate. For applications
like Frigate and UniFi Network, these need to be signed by a CA that
the browser will trust, so the ED25519 certificate is inappropriate.
Thus, I've decided to migrate back to DCH CA R2, which uses an EdDSA
signature, and can therefore be trusted by Firefox, etc.
The *hlcforms* application handles form submissions for the Hatch
Learning Center website. It has various features for Tabitha that are
only accessible internally, but the form submission handler itself of
course needs to be accessible anonymously.