Rust Desk is a remote assistance software solution. The open source
edition is sufficient for what I want to do with it, namely: help Mom
and Dad troubleshoot issues on their PCs. Mom is currently having
trouble with the Nextcloud sync client, so I need to be able to help her
with that.
Sometimes, Grafana gets pretty slow, especially when it's running on one
of the Raspberry Pi nodes. When this happens, the health check may take
longer than the default timeout of 1 second to respond. This then marks
the pod as unhealthy, even though it's still working.
The `k8s-reboot-coordinator` coordinates node reboots throughout the
cluster. It runs as a DaemonSet, watching for the presence of a
sentinel file, `/run/reboot-needed` on the node. When the file appears,
it acquires a lease, to ensure that only one node reboots at a time,
cordons and drains the node, and then triggers the reboot by running
a command on the host. After the node has rebooted, the daemon will
release the lock and uncordon the node.
The `policy` Kustomize project defines various cluster-wide security
policies. Initially, this includes a Validating Admission Policy that
prevents pods from using the host's network namespace.
The _updatebot_ has been running with an old configuration for a while,
so while it was correctly identifying updates to ZWaveJS UI and
Zigbee2MQTT, it was generating overrides for the incorrect OCI image
names.
Buildroot jobs really benefit from having a persistent workspace volume
instead of an ephemeral one. This way, only the packages, etc. that
have changed since the last build need to be built, instead of the whole
toolchain and operating system.
As with AlertManager, the point of having multiple replicas of `vmagent`
is so that one is always running, even if the other fails. Thus, we
want to start the pods in parallel so that if the first one does not
come up, the second one at least has a chance.
If something prevents the first AlertManager instance from starting, we
don't want to wait forever for it before starting the second. That
pretty much defeats the purpose of having two instances. Fortunately,
we can configure Kubernetes to bring up both instances simultaneously by
setting the pod management policyo to `Parallel`.
We also don't need a 4 GB volume for AlertManager; even 500 MB is
way too big for the tiny amount of data it stores, but that's about the
smallest size a filesystem can be.
The `cert-exporter` is no longer needed. All websites manage their own
certificates with _mod_md_ now, and all internal applications that use
the wildcard certificate fetch it directly from the Kubernetes Secret.
_bw0.pyrocufflink.blue_ has been decommissioned since some time, so it
doesn't get backed up any more. We want to keep its previous backups
around, though, in case we ever need to restore something. This
triggers the "no recent backups" alert, since the last snapshot is over
a week old. Let's ignore that hostname when generating this alert.
The `vmagent` needs a place to spool data it has not yet sent to
Victoria Metrics, but it doesn't really need to be persistent. As long
as all of the `vmagent` nodes _and_ all of the `vminsert` nodes do not
go down simultaneously, there shouldn't be any data loss. If they are
all down at the same time, there's probably something else going on and
lost metrics are the least concerning problem.
The _dynk8s-provisioner_ only needs writable storage to store copies of
the AWS SNS notifications it receives for debugging purposes. We don't
need to keep these around indefinitely, so using ephemeral node-local
storage is sufficient. I actually want to get rid of that "feature"
anyway...
Although Firefly III works on a Raspberry Pi, a few things are pretty
slow. Notably, the search feature takes a really long time to return
any results, which is particularly annoying when trying to add a receipt
via the Receipts app. Adding a node affinity rule to prefer running on
an x86_64 machine will ensure that it runs fast whenever possible, but
can fall back to running on a Rasperry Pi if necessary.
The "cron" container has not been working correctly for some time. No
background tasks are getting run, and this error is printed in the log
every minute:
> `Target class [db.schema] does not exist`
It turns out, this is because of the way the PHP `artisan` tool works.
It MUST be able to write to the code directory, apparently to build some
kind of cache. There may be a way to cache the data ahead of time, but
I haven't found it yet. For now, it seems the only way to make
Laravel-based applications run in a container is to make the container
filesystem mutable.
Music Assistant doesn't expose any metrics natively. Since we really
only care about whether or not it's accessible, scraping it with the
blackbox exporter is fine.
In order to allow access to Authelia from outside the LAN, it needs to
be able to handle the _pyrocufflink.net_ domain in addition to
_pyrocufflink.blue_. Originally, this was not possible, as Authelia
only supported a single cookie/domain. Now that it supports multiple
cookies, we can expose both domains.
The main reason for doing this now is use Authelia's password reset
capability for Mom, since she didn't have a password for her Nextcloud
account that she's just begun using.
I wrote a Thunderbird add-on for my work computer that periodically
exports my entire DTEX calendar to a file. Unfortunately, the file it
creates is not directly usable by the kitchen screen server currently;
it seems to use a time zone identifier that `tzinfo` doesn't understand:
```
Error in background update:
Traceback (most recent call last):
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/agenda.py", line 19, in _background_update
await self._update()
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/agenda.py", line 34, in _update
calendar = await self.fetch_calendar(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/kitchen/service/caldav.py", line 39, in fetch_calendar
return icalendar.Calendar.from_ical(r.text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/icalendar/cal.py", line 369, in from_ical
_timezone_cache[component['TZID']] = component.to_tz()
^^^^^^^^^^^^^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/icalendar/cal.py", line 659, in to_tz
return cls()
^^^^^
File "/usr/local/kitchen/lib64/python3.12/site-packages/pytz/tzinfo.py", line 190, in __init__
self._transition_info[0])
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
```
It seems to work fine in Nextcloud, though, so the work-around is to
import it as a subscription in Nextcloud and then read it from there,
using Nextcloud as a sort of proxy.
There is not (currently) an aarch64 build of the kitchen screen server,
so we need to force the pod to run on a x86_64 node. This seems a good
candidate for running on a Raspberry Pi, so I should go ahead and build
a multi-arch image.
_democratic-csi_ can also dynamically resize Synology iSCSI LUNs when
PVC resource requests increase. This requires enabling the external
resizer in the controller pod and marking the StorageClass as supporting
resize.
The _democratic-csi_ controller can create Synology LUN snapshots based
on VolumeSnapshot resources. This feature can be used to e.g. create
data snapshots before upgrades, etc.
Deploying _democratic-csi_ to manage PersistentVolumeClaim resources,
mapping them to iSCSI volumes on the Synology.
Eventually, all Longhorn-managed PVCs will be replaced with Synology
iSCSI volumes. Getting rid of Longhorn should free up a lot of
resources and remove a point of failure from the cluster.