The _updatebot_ has been running with an old configuration for a while,
so while it was correctly identifying updates to ZWaveJS UI and
Zigbee2MQTT, it was generating overrides for the incorrect OCI image
names.
Buildroot jobs really benefit from having a persistent workspace volume
instead of an ephemeral one. This way, only the packages, etc. that
have changed since the last build need to be built, instead of the whole
toolchain and operating system.
As with AlertManager, the point of having multiple replicas of `vmagent`
is so that one is always running, even if the other fails. Thus, we
want to start the pods in parallel so that if the first one does not
come up, the second one at least has a chance.
If something prevents the first AlertManager instance from starting, we
don't want to wait forever for it before starting the second. That
pretty much defeats the purpose of having two instances. Fortunately,
we can configure Kubernetes to bring up both instances simultaneously by
setting the pod management policyo to `Parallel`.
We also don't need a 4 GB volume for AlertManager; even 500 MB is
way too big for the tiny amount of data it stores, but that's about the
smallest size a filesystem can be.
The `cert-exporter` is no longer needed. All websites manage their own
certificates with _mod_md_ now, and all internal applications that use
the wildcard certificate fetch it directly from the Kubernetes Secret.
_bw0.pyrocufflink.blue_ has been decommissioned since some time, so it
doesn't get backed up any more. We want to keep its previous backups
around, though, in case we ever need to restore something. This
triggers the "no recent backups" alert, since the last snapshot is over
a week old. Let's ignore that hostname when generating this alert.
The `vmagent` needs a place to spool data it has not yet sent to
Victoria Metrics, but it doesn't really need to be persistent. As long
as all of the `vmagent` nodes _and_ all of the `vminsert` nodes do not
go down simultaneously, there shouldn't be any data loss. If they are
all down at the same time, there's probably something else going on and
lost metrics are the least concerning problem.
The _dynk8s-provisioner_ only needs writable storage to store copies of
the AWS SNS notifications it receives for debugging purposes. We don't
need to keep these around indefinitely, so using ephemeral node-local
storage is sufficient. I actually want to get rid of that "feature"
anyway...
Although Firefly III works on a Raspberry Pi, a few things are pretty
slow. Notably, the search feature takes a really long time to return
any results, which is particularly annoying when trying to add a receipt
via the Receipts app. Adding a node affinity rule to prefer running on
an x86_64 machine will ensure that it runs fast whenever possible, but
can fall back to running on a Rasperry Pi if necessary.
The "cron" container has not been working correctly for some time. No
background tasks are getting run, and this error is printed in the log
every minute:
> `Target class [db.schema] does not exist`
It turns out, this is because of the way the PHP `artisan` tool works.
It MUST be able to write to the code directory, apparently to build some
kind of cache. There may be a way to cache the data ahead of time, but
I haven't found it yet. For now, it seems the only way to make
Laravel-based applications run in a container is to make the container
filesystem mutable.
Music Assistant doesn't expose any metrics natively. Since we really
only care about whether or not it's accessible, scraping it with the
blackbox exporter is fine.
In order to allow access to Authelia from outside the LAN, it needs to
be able to handle the _pyrocufflink.net_ domain in addition to
_pyrocufflink.blue_. Originally, this was not possible, as Authelia
only supported a single cookie/domain. Now that it supports multiple
cookies, we can expose both domains.
The main reason for doing this now is use Authelia's password reset
capability for Mom, since she didn't have a password for her Nextcloud
account that she's just begun using.
I wrote a Thunderbird add-on for my work computer that periodically
exports my entire DTEX calendar to a file. Unfortunately, the file it
creates is not directly usable by the kitchen screen server currently;
it seems to use a time zone identifier that `tzinfo` doesn't understand:
```
Error in background update:
Traceback (most recent call last):
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/agenda.py", line 19, in _background_update
await self._update()
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/agenda.py", line 34, in _update
calendar = await self.fetch_calendar(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/caldav.py", line 39, in fetch_calendar
return icalendar.Calendar.from_ical(r.text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/icalendar/cal.py", line 369, in from_ical
_timezone_cache[component['TZID']] = component.to_tz()
^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/icalendar/cal.py", line 659, in to_tz
return cls()
^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/pytz/tzinfo.py", line 190, in __init__
self._transition_info[0])
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
```
It seems to work fine in Nextcloud, though, so the work-around is to
import it as a subscription in Nextcloud and then read it from there,
using Nextcloud as a sort of proxy.
There is not (currently) an aarch64 build of the kitchen screen server,
so we need to force the pod to run on a x86_64 node. This seems a good
candidate for running on a Raspberry Pi, so I should go ahead and build
a multi-arch image.
_democratic-csi_ can also dynamically resize Synology iSCSI LUNs when
PVC resource requests increase. This requires enabling the external
resizer in the controller pod and marking the StorageClass as supporting
resize.
The _democratic-csi_ controller can create Synology LUN snapshots based
on VolumeSnapshot resources. This feature can be used to e.g. create
data snapshots before upgrades, etc.
Deploying _democratic-csi_ to manage PersistentVolumeClaim resources,
mapping them to iSCSI volumes on the Synology.
Eventually, all Longhorn-managed PVCs will be replaced with Synology
iSCSI volumes. Getting rid of Longhorn should free up a lot of
resources and remove a point of failure from the cluster.
This hacky work-around is no longer necessary, as I've figured out why
the players don't (always) get rediscovered when the server restarts.
It turns out, Avahi on the firewall was caching responses to the mDNS PTR
requests Music Assistant makes. Rather than forward the requests to the
other VLANs, it would respond with its cached information, but in a way
that Music Assistant didn't understand. Setting `cache-entries-max` to
`0` in `avahi-daemon.conf` on the firewall resolved the issue.
This reverts commit 42a7964991.
I haven't fully determined why, but when the Music Assistant server
restarts, it marks the _shairport-sync_ players as offline and will not
allow playing to them. The only way I have found to work around this is
to restart the players after the server restarts. As that's pretty
cumbersome and annoying, I naturally want to automate it, so I've
created this rudimentary synchronization technique using _ntfy_: each
player listens for notifications on a specific topic, and upon receiving
one, tells _shairport-sync_ to exit. With the `Restart=` property
configured on the _shairport-sync.service_ unit, _systemd_ will restart
the service, which causes Music Assistant to discover the player again.
_Music Assistant_ is pretty straightforward to deploy, despite
upstream's apparent opinion otherwise. It just needs a small persistent
volume for its media index and customization. It does need to use the
host network namespace, though, in order to receive multicast
announcements from e.g. AirPlay players, as it doesn't have any way of
statically configuring them.
Jenkins needs to be able to patch the Deployment to trigger a restart
after it builds a new container image for _dch-webhooks_.
Note that this manifest must be applied on its own **without
Kustomize**. Kustomize seems to think the `dch-webhooks` in
`resourceNames` refers to the ConfigMap it manages and "helpfully"
renames it with the name suffix hash. It's _not_ the ConfigMap, though,
but there's not really any way to tell it this.
Without a node affinity rule, Kubernetes applies equal weight to the
"big" x86_64 nodes and the "small" aarch64 ones. Since we would really
rather Piper and Whisper _not_ run on a Raspberry Pi, we need the rule
to express this.
As it turns out, although Home Assistant itself works perfectly fine on
a Raspberry Pi, Piper and Whisper do not. They are _much_ too slow to
respond to voice commands.
This reverts commit 32666aa628.