Scraping the public DNS servers doesn't work anymore since the firewall
routes traffic through Mullvad. Pinging public cloud providers should
give a pretty decent indication of Internet connectivity. It will also
serve as a benchmark for the local DNS performance, since the names will
have to be resolved.
By default, the `instance` label for discovered metrics targets is set
to the scrape address. For Kubernetes pods, that is the IP address and
port of the pod, which naturally changes every time the pod is recreated
or moved. This will cause a high churn rate for Longhorn manager pods.
To avoid this, we set the `instance` label to the name of the node the
pod is running on, which will not change because the Longhorn manager
pods are managed by a DaemonSet.
After considering the implications of Authelia's pre-configured consent
feature, I decided I did not like the fact that a malicious program
could potentially take over my entire Kubernetes cluster without my
knowledge, since `kubectl` may not require any interaction, and could
therefore be executed without my knowledge. I stopped ticking the
"Remember Consent" checkbox out of paranoia, but that's gotten kind of
annoying. I figure a good compromise is to only prompt for consent a
couple of times per day.
The *darkchestofwonders.us* website is a legacy Python/mod_wsgi
application. It was down for a while after updating the main web server
to Fedora 38. Although we don't upload as many screenshots anymore, we
do still enjoy looking at the old ones. Until I get a chance to either
update the site to use a more modern deplyoment mechansim, or move the
screenshots to some other photo hosting system, the easiest way to keep
it online is to run it in a container.
Each Longhorn manager pod exports metrics about the node on which it is
running. Thus, we have to scrape every pod to get the metrics about the
whole ecosystem.
The original RBAC configuration allowed `vmagent` only to list the pods
in the `victoria-metrics` namespace. In order to allow it to monitor
other applications' pods, it needs to be assigned permission to list
pods in all namespaces.
Since *mtrcs0.pyrocufflink.blue* (the Metrics Pi) seems to be dying,
I decided to move monitoring and alerting into Kubernetes.
I was originally planning to have a single, dedicated virtual machine
for Victoria Metrics and Grafana, similar to how the Metrics Pi was set
up, but running Fedora CoreOS instead of a custom Buildroot-based OS.
While I was working on the Ignition configuration for the VM, it
occurred to me that monitoring would be interrupted frequently, since
FCOS updates weekly and all updates require a reboot. I would rather
not have that many gaps in the data. Ultimately I decided that
deploying a cluster with Kubernetes would probably be more robust and
reliable, as updates can be performed without any downtime at all.
I chose not to use the Victoria Metrics Operator, but rather handle
the resource definitions myself. Victoria Metrics components are not
particularly difficult to deploy, so the overhead of running the
operator and using its custom resources would not be worth the minor
convenience it provides.
Moving the shell command to an external script allows me to update it
without having to restart Home Assistant.
Including the SSH private key in the Secret not only allows it to be
managed by Kubernetes, but also works around a permissions issue when
storing the key in the `/config` volume. The `ssh` command refuses to
use a key file with write permission for the group or other fields, but
the Kubelet sets `g=rw` when `fsGroup` is set on the pod.
When transitioning to the ConfigMap for maintaining Home Assistant YAML
configuration, I did not bring the `event-snapshot.sh` script because I
thought it was no longer in use. It turns out I was mistaken; it is
used by the driveway camera alerts.
Editing `configuration.yaml` et al. using `vi` via `kubectl exec` is
rather tedious, since the version of `vi` in the *home-assistant*
container image is very rudimentary. Thus, I think it would be better
to use a ConfigMap to store the manually-edited YAML files, so I can
edit them with my regular editor on my desktop. For this to work, the
ConfigMap has to be mounted as a directory rather than as individual
files (using `subPath`), as otherwise the pod would have to be restarted
every time one of the files is updated.
Since we've configured the Ingress for Firefly III to log everyone in as
*dustin* via a faked `Remote-User` request header, any user on the
Pyrocufflink domain would be able to see my finances. Using Authelia's
access control mechanism, we can restrict this to only users in a
specific group.
Tabitha has decided not to use Firefly to manage her finances. We've
mostly consolidated our expenses and income now, which I manage in my
Firefly account. In fact, the Ingress for Firefly III itself always
sets the `Remote-User: dustin` header, so only my account is accessible
anyway. Thus, there is no longer any reason to have two Data Importer
instances.
The Firefly III Data Importer uses the value of `FIREFLY_III_URL` to
constuct links to transactions in email notifications. Since this URL
points to the internal Kubernetes service rather than the canonical URL
used by clients, these links are invalid. Fortunately, there is another
setting, `VANITY_URL`, that the Data Importer will use only when
constructing public-facing links.
The Firefly III Data Importer does not allow transaction imports by
unattended HTTP requests by default, but this can be enabled with the
`CAN_POST_FILES` environment variable. Additionally, an
`AUTO_IMPORT_SECRET` environment variable must be set containing a
shared "secret" value which must be provided in the querystring of
autoimport requests.
Since we have the Data Importer protected by Authelia, we need to make
some additional changes to the Ingress to allow unattended
authentication. Authelia supports passing the username and password of
an authorized user in the `Proxy-Authorization` HTTP request header. If
this header is valid, it will allow the request through. Unfortunately,
many HTTP clients will not set this header unless they are also
configured to explicitly connect via a forward proxy. To simplify
usage of such clients, we can configure nginx to copy the value of the
normal `Authorization` header into `Proxy-Authorization`, thus allowing
clients to use simple HTTP Basic authentication, even though the Data
Importer doesn't actually support it.
The *jenkins-repohost* Secret contains an SSH private key Jenkins jobs
can use to publish RPM packages to the Yum repo host on
*files.pyrocufflink.blue*.
The *rpm-gpg-key* and *rpm-gpg-key-passphrase* Secrets contain the GnuPG
private key and its encryption passphrase, respectively, that can be
used to sign RPM packages. This key is trusted by managed nodes on the
Pyrocufflink network.
The [Kubernetes Credentials Provider][0] plugin for Jenkins allows
Jenkins to expose Kubernetes Secret resources as Jenkins Credentials.
Jobs can use them like normal Jenkins credentials, e.g. using
`withCredentials`, `sshagent`, etc. The only drawback is that every
credential exposed this way is available to every job, at least until
[PR #40][1] is merged. Fortunately, jobs managed by this Jenkins
instance are all trusted; no anonymous pull requests are possible, so
the risk is mitigated.
[0]: https://jenkinsci.github.io/kubernetes-credentials-provider-plugin/
[1]: https://github.com/jenkinsci/kubernetes-credentials-provider-plugin/pull/40
Setting the `imagePullSecrets` property on the default service account
for the *jenkins-jobs* namespace allows jobs to run from private
container images automatically, without additional configuration in the
pipeline definitions.
[sshca] is a simple web service I wrote to automatically create signed
SSH certificates for hosts' public keys. It authenticates hosts by
their machine UUID, which it can find using the libvirt API.
[sshca]: https://git.pyrocufflink.net/dustin/sshca
The Raspberry Pi usually has the most free RAM of all the Kubernetes
nodes, so pods tend to get assigned there even when it would not be
appropriate. Jenkins, for example definitely does not need to run
there, so let's force it to run on the bigger nodes.
Argo CD will delete and re-create this Job each time it synchronizes the
*jenkins* application. The job creates a snapshot of the Jenkins volume
using an HTTP request to the Longhorn UI.
When migrating the `pod-secrets` Secret to a SealedSecret, I
accidentally created it using the `--from-file` instead of
`--from-env-file` argument to `kubectl secret create generic`. This had
the effect of creating a single key named `pod.secrets` with the entire
contents of the file as its value. This broke backups to MinIO, since
the PostgreSQL containers could no longer read the credentials from the
environment. Regenerating the SealedSecret with the correct arguments
resolves this issue.
By default, Authelia uses a local SQLite database for persistent data
(e.g. authenticator keys, TOTP secrets, etc.) and keeps session data in
memory. Together, these have some undesirable side effects. First,
since needing access to the filesystem to store the SQLite database
means that the pod has to be managed by a StatefulSet. Restarting
StatefulSet pods means stopping them all and then starting them back up,
which causes downtime. Additionally, the SQLite database file needs to
be backed up, which I never got around to setting up. Further, any time
the service is restarted, all sessions are invalidated, so users have to
sign back in.
All of these issues can be resolved by configuring Authelia to store all
of its state externally. The persistent data can be stored in a
PostgreSQL database and the session state can be stored in Redis. Using
a database managed by the existing Postgres Operator infrastructure
automaticaly enables high availability and backups as well.
To migrate the contents of the database, I used [pgloader]. With
Authelia shut down, I ran the migration job. Authelia's database schema
is pretty simple, so there were no problems with the conversion.
Authelia started back up with the new database configuration without any
issues.
Session state are still stored only in memory of the Redis process.
This is probably fine, since Redis will not need restarted often, except
for updates. At least restarting Authelia to adjust its configuration
will not log everyone out.
[pgloader]: https://pgloader.readthedocs.io/en/latest/ref/sqlite.html
The PostgreSQL server managed by *Postgres Operator* uses a self-signed
certificate by default. In order to enable full validation of the
server certificate, we need to use a certificate signed by a known CA
that the clients can trust. To that end, I have added a *cert-manager*
Issuer specifically for PostgreSQL. The CA certificate is also managed
by *cert-manager*; it is self-signed and needs to be distributed to
clients out-of-band.
The `config.yml` document for *kitchen* contains several "secret" values
(e.g. passwords to Nextcloud, MQTT, etc.). We don't want to commit
these to the Git repository, of course, but as long as Kustomize expects
to find the `config.yml` file, we won't be able to manage the
application with Argo CD. Ultimately, *kitchen* needs to be modified to
be able to read secrets separately from config, but until then, we will
have to avoid managing `config.yml` with Kustomize.
I actually created this a long time ago, but forgot to update the
manifest in Git.
The *homeassistant* database is used by Home Assistant for its
*recorder* component, which stores long-term statistics. The data
stored here are only used for e.g. History and Logbook; current entity
states are still stored on the filesystem.
The `argocd` command needs to have its own OIDC client configuration,
since it works like a "public" client. To log in, run
```sh
argocd login argocd.pyrocufflink.blue --sso
```
Without `disableNameSuffixHash` enabled, Kustomize will create a unique
ConfigMap any time the contents of source file change. It will also
update any Deployment, StatefulSet, etc resources to point to the new
ConfigMap. This has the effect of restarting any pods that refer to the
ConfigMap whenever its contents change.
I had avoided using this initially because Kustomize does *not* delete
previous ConfigMap resources whenever it creates a new one. Now that we
have Argo CD, though, this is not an issue, as it will clean up the old
resources whenever it synchronizes.