I did not realize the batteries on the garage door tilt sensors had
died. Adding alerts for various sensor batteries should help keep me
better informed.
Sometimes, I want to be able to look at active alerts without logging
in. This rule allows read-only access to the AlertManager UI and API.
Unfortunately, the user experience when attempting to create a new
Silence using the UI without first logging in is suboptimal, but I think
that's worth the trade-off.
The Longhorn volume for the *invoice-ninja* PVC got into a strange state
following an unexpected shutdown this morning. One of its replicas
seemed to have disappeared, and it also thought that the size had
changed. As such, it got stuck in "expanding" state, but it was not
actually being expanded. This issue is described in detail in the
Longhorn documentation: [Troubleshooting: Unexpected expansion leads to
degradation or attach failure][0]. Unfortunately, there is no way to
recover a volume from that state, and it must be deleted and recreated
from backup. This changes some of the properties of the PVC, so they
need to be updated in the manifest.
[0]: https://longhorn.io/kb/troubleshooting-unexpected-expansion-leads-to-degradation-or-attach-failure/
Jenkins jobs that build container images need access to `/dev/fuse`.
Thus, we have to allow Pods managed by the *fuse-device-plugin*
DaemonSet to be scheduled on nodes that are tainted for use exclusively
by Jenkins jobs.
Members of the *Server Admins* group need to be able to log in to
machines using their respective privileged accounts for e.g.
provisioning or emergencies.
Graylog is down because Elasticsearch corrupted itself again, and this
time, I'm just not going to bother fixing it. I practically never use
it anymore anyway, and I want to migrate to Grafana Loki, so now seems
like a good time to just get rid of it.
The configuration file for the kitchen HUD server has credentials
embedded in it. Until I get around to refactoring it to read these from
separate locations, we'll make use of the template feature of
SealedSecrets. With this feature, fields can refer to the (decrypted)
value of other fields using Go template syntax. This makes it possible
to have most of the `config.yaml` document unencrypted and easily
modifiable, while still protecting the secrets.
Now that Victoria Metrics is hosted in Kubernetes, it only makes sense
to host Grafana there as well. I chose to use a single-instance
deployment for simplicity; I don't really need high availability for
Grafana. Its configuration does not change enough to worry about the
downtime associated with restarting it. Migrating the existing data
from SQLite to PostgreSQL, while possible, is just not worth the hassle.
Invoice Ninja is a small business management tool. Tabitha wants to
use it for HLC.
I am a bit concerned about the code quality of this application, and
definitely alarmed at the data it send upstream, so I have tried to be
extra careful with it. All privileges are revoked, including access to
the Internet.
The `update-machine-ids.sh` shell script helps update the `sshca-data`
SealedSecret with the current contents of the `machine-ids.json` file
(stored locally, not tracked in Git).
*vmalert* has been generating alerts and triggering notifications, but
not writing any `ALERTS`/`ALERTS_FOR_STATE` metrics. It turns out this
is because I had not correctly configured the remote read/write
URLs.
If Frigate is running but not connected to the MQTT broker, the
`sensor.frigate_status` entity will be available, but the
`update.frigate_server` entity will not.
Since (almost) all managed hosts have SSH certificates signed by SSHCA
now, the need to maintain a pseudo-dynamic SSH key list is winding down.
If we include the SSH CA key in the global known hosts file, and
explicitly list the couple of hosts that do not have a certificate, we
can let Ansible use that instead of fetching the host keys on each run.
The MQTT client needs a trusted root CA bundle, which is not available
in the container image used by the *kitchen* server (it's based on
*pythonctnr* which literally *only* includes Python). Fortunately, as
it uses OpenSSL under the hood, we can configure it to use the bundle
included with the *certifi* Python package via an environment variable.
Kubernetes will not start additional Pods in a StatefulSet until the
existing ones are Ready. This means that if there is a problem bringing
up, e.g. `vmstorage-0`, it will never start `vmstorage-1` or
`vmstorage-2`. Since this pretty much defeats the purpose of having a
multi-node `vmstorage` cluster, we have to remove the readiness probe,
so the Pods will be Ready as soon as they start. If there is a problem
with one of them, it will matter less, as the others can still run.
The *virt* plugin for *collectd* sets `instance` to the name of the
libvirt domain the metric refers to. This makes it so there is no label
identifying which host the VM is running on. Thus, if we want to
classify metrics by VM host, we need to add that label explicitly.
Since the `__address__` label is not available during metric relabeling,
we need to store it in a temporary label, which gets dropped at the end
of the relabeling phase. We copy the value of that label into a new
label, but only for metrics that match the desired metric name.
When Home Assistant starts, if PostgreSQL is unavailable, it will come
up successfully, but without the history component. It never tries
again to connect and enable the component. This makes it difficult to
detect the problem and thus easy to overlook the missing functionality.
To avoid having extended periods of missing state history, we'll force
Home Assistant to wait for PostgreSQL to come up before starting.
`keyserv` is a little utility I wrote to dispense *age* keys to clients.
It uses SSH certificates for authentication. If the client presents an
SSH certificate signed by a trusted key, the server will return all the
keys the principal(s) listed in the certificate are allowed to use. The
response is encrypted with the public key from the certificate, so the
client must have access to the corresponding private key in order to
read the response.
I am currently using this server to provide keys for the new
configuration policy. The keys herein are used to encrypt NUT monitor
passwords.
I found the official container image for Prometheus Blackbox exporter.
It is hosted on Quay, which is why I didn't see it on Docker Hub when I
looked initially.
Scraping the public DNS servers doesn't work anymore since the firewall
routes traffic through Mullvad. Pinging public cloud providers should
give a pretty decent indication of Internet connectivity. It will also
serve as a benchmark for the local DNS performance, since the names will
have to be resolved.
Cluster Autoscaler version is supposed to match the Kubernetes version.
Also, updating specifically to address ASG tags for node resources
([issue 5164]).
[issue 5164]: https://github.com/kubernetes/autoscaler/issues/5164
The *cert-exporter* script really only needs the SSH host key for Gitea,
so the dynamic host key fetch is overkill. Since it frequently breaks
for various reasons, it's probably better to just have a static list of
trusted keys.
By default, the `instance` label for discovered metrics targets is set
to the scrape address. For Kubernetes pods, that is the IP address and
port of the pod, which naturally changes every time the pod is recreated
or moved. This will cause a high churn rate for Longhorn manager pods.
To avoid this, we set the `instance` label to the name of the node the
pod is running on, which will not change because the Longhorn manager
pods are managed by a DaemonSet.
After considering the implications of Authelia's pre-configured consent
feature, I decided I did not like the fact that a malicious program
could potentially take over my entire Kubernetes cluster without my
knowledge, since `kubectl` may not require any interaction, and could
therefore be executed without my knowledge. I stopped ticking the
"Remember Consent" checkbox out of paranoia, but that's gotten kind of
annoying. I figure a good compromise is to only prompt for consent a
couple of times per day.