The *dch-webhooks* user is used by *dch-webhooks* in order to publish
host information when a new machine triggers its _POST /host/online_
webhook. It therefore needs to be able to write to the
_host-provisioner_ queue (via the default exchange).
The *host-provisioner* user is used by the corresponding consumer to
receive the host information and initiate the provisioning process.
The *dch-webhooks* server now has a _POST /host/online_ hook that can
be triggered by a new machine when it first comes online. This hook
starts an automatic provisioning process by creating a Kubernetes Job
to run Ansible and publishing information about the host to provision
via AMQP. Thus, the server now needs access to the Kubernetes API in
order to create the Job and access to RabbitMQ in order to publish the
task parameters.
The contents of the DCH Root CA will not change, so it does not make
sense to enable the hash suffix feature for this ConfigMap. Without it,
the ConfigMap name is predictable and can be used outside of a Kustomize
project.
The `pg_stat_archiver_failed_count` metric is a counter, so once a WAL
archival has failed, it will increase and never return to `0`. To
ensure the alert is resolved once the WAL archival process recovers, we
need to use the `increase` function to turn it into a gauge. Finally,
we aggregate that gauge with `max_over_time` to keep the alert from
flapping if the WAL archive occurs less frequently than the scrape
interval.
We're using the Alpine variant of the Vaultwarden container images,
since the default variant is significantly larger and we do not need any
of the extra stuff it includes.
[ARA Records Ansible][0] is a results storage system for Ansible. It
provides a convenient UI for tracking Ansible playbooks and tasks. The
data are populated by an Ansible callback plugin.
ARA is a fairly simple Python+Django application. It needs a database
to store Ansible results, so we've connected it to the main PostgreSQL
database and configured it to connect and authenticate using mTLS.
Rather than mess with managing and distributing a static password for
ARA clients, I've configured Autheliad to allow anonymous access to
post data to the ARA API from within the private network or the
Kubernetes cluster. Access to the web UI does require authentication.
[0]: https://ara.recordsansible.org/
At some point this week, the front porch camera stopped sending video.
I'm not sure exactly what happened to it, but Frigate kept logging
"Unable to read frames from ffmpeg process." I power-cycled the camera,
which resolved the issue.
Unfortunately, no alerts were generated about this situation. Home
Assistant did not consider the camera entity unavailable, presumably
because Frigate was still reporting stats about it. Thus, I missed
several important notifications. To avoid this in the future, I have
enabled the "Camera FPS" sensors for all of the cameras in Home
Assistant, and added this alert to trigger when the reported framerate
is 0.
I really also need to get alerts for log events configured, as that
would also indicated there was an issue.