Piper is the new text-to-speech service for Home Assistant. Whisper is
a speech-to-text service. Together, these services, which communicate
with Home Assistant via the Wyoming protocol, provide the speech
interface to the new Home Assistant Voice Assistant feature.
This commit adds resources for deploying the Home Assistant ecosystem
inside Kubernetes. Home Assistant itself, as well as Mosquitto, are
just normal Pods, managed by StatefulSets, that can run anywhere.
ZWaveJS2MQTT and Zigbee2MQTT, on the other hand, have to run on a
special node (a Raspberry Pi), where the respective controllers are
attached.
The Home Assistant UI is exposed externally via an Ingress resource.
The MQTT broker is also exposed externally, using the TCP proxy feature
of *ingress-nginx*. Additionally, the Zigbee2MQTT and ZWaveJS2MQTT
control panels are exposed via Ingress resources, but these are
protected by Authelia.
Hatch Learning Center has several domains; Tabitha couldn't decide which
she liked best :) At present, they all resolve to the same website, with
_hatchlearningcenter.org_ as the canonical name.
The *dch-webhooks* service is a generic tool I've written to handle
various automation flows. For now, it only has one feature: when a
transaction is created in Firefly-III, it searches Paperless-ngx for a
matching receipt, and if found, attaches it to the transaction.
If I remember to add the _acme-challenge CNAME record *before* applying
the Certificate resource, it takes a little under 5 minutes to issue a
new certificate.
Apparently, *Firefly III* thinks it is a good idea to send an email to
the administrator every time it encounters an error. This is
particularly annoying when doing database maintenance, as the Kubernetes
health checks trigger an error every minute, which *Firefly III*
helpfully notifies me about.
Fortunately, this behavior can be disabled.
[Firefly III][0] is a free and open source, web-based personal finance
management application. It features a double-entry bookkeeping system
for tracking transactions, plus other classification options like
budgets, categories, and tags. It has a rule engine that can
automatically manipulate transactions, plus several other really useful
features.
The application itself is mostly standard browser-based GUI written in
PHP. There is an official container image, though it is not
particularly well designed and must be run as root (it does drop
privileges before launching the actual application, thankfully). I may
decide to create a better image later.
Along with the main application, there is a separate tool for importing
transactions from a CSV file. Its design is rather interesting: though
it is a web-based application, it does not have any authentication or
user management, but uses a user API key to access the main Firefly III
application. This effectively requires us to have one instance of the
importer per user. While not ideal, it isn't particularly problematic
since there are only two of us (and Tabitha may not even end up using
it; she seems to like YNAB).
[0]: https://www.firefly-iii.org/
While I was preparing to deploy PostgreSQL for Firefly III, I was
thinking it would be a neat idea to write an operator that uses
custom resources to manage PostgreSQL roles and databases. Then I
though, surely something like that must exist already. As it turns out,
the [Postgres Operator][0] does exactly that, and a whole lot more.
The *Postgres Operator* handles deploying PostgreSQL server instances,
including primary/standby replication with load balancers. It uses
custom resources to manage the databases and users (roles) in an
instance, and stores role passwords in Secret resources. It supports
backing up instances using `pg_basebackup` and WAL archives (i.e.
physical backups) via [WAL-E][1]/[WAL-G][2]. While various backup
storage targets are supported, *Postgres Operator* really only works
well with the cloud storage services like S3, Azure, and Google Cloud
Platform. Fortunately, S3-compatible on-premises solutions like MinIO
are just fine.
I think for my use cases, a single PostgreSQL cluster with multiple
databases will be sufficient. I know *Firefly III* will need a
PostgreSQL database, and I will likely want to migrate *Paperless-ngx*
to PostgreSQL eventually too. Having a single instance will save on
memory resources, at the cost of per-application point-in-time recovery.
For now, just one server in the cluster is probably sufficient, but
luckily adding standby servers appears to be really easy should the need
arise.
[0]: https://postgres-operator.readthedocs.io/en/latest/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
This configuration is for the instance of MinIO running on the BURP
server, which will be used to store PostgreSQL backups created by the
Postgres Operator.
Using *acme-dns.io* is incredibly cumbersome. Since each unique
subdomain requires its own set of credentials, the `acme-dns.json` file
has to be updated every time a new certificate is added. This
effectively precludes creating certificates via Ingress annotations.
As Cloudflare's DNS service is free and anonymous as well, I thought I
would try it out as an alternative to *acme-dns.io*. It seems to work
well so far. One potential issue, though, is Cloudflare seems to have
several nameservers, with multiple IP addresses each. This may require
adding quite a few exceptions to the no-outbound-DNS rule on the
firewall. I tried using the "recursive servers only" mode of
*cert-manager*, however, as expected, the recursive servers all cache
too aggressively. Since the negative cache TTL value in the SOA record
for Cloudflare DNS zones is set to 1 hour and cannot be configured, ACME
challenges can take at least that long in this mode. Thus, querying the
authoritative servers directly is indeed the best option, even though it
violates the no-outbound-DNS rule.
Using the local name server as the authoritative server for ACME
challenge records turned out to be quite problematic. For some reason,
both Google and Cloudflare kept returning SERVFAIL responses for the
*_acme-challenge* TXT queries. I suspect this may have had something to
do with how BIND was configured to be the authoritative server for the
*o-ak4p9kqlmt5uuc.com* while also being a recusive resolver for clients
on the local network.
Using *acme-dns.io* resolves these issues, but it does bring a few of
its own. Notably, each unique domain and subdomain must have its own
set of credentials (specified in the `acme-dns.json`) file. This makes
adding new certificates rather cumbersome.
The `cert-exporter` tool fetches certificates from Kubernetes Secret
resources and commits them to a Git repository. This allows
certificates managed by *cert-manager* to be used outside the Kubernetes
cluster, e.g. for services running on other virtual machines.
The wildcard certificate for the *pyrocufflink.net* and
*pyrocufflink.blue* domains is now handled by *cert-manager* and saved
to *certs.git* by `cert-exporter.
*cert-manager* manages certificates. More specifically, it is an ACME
client, which generates certificate-signing requests, submits them to a
certificate authority, and stores the signed certificate in Kubernetes
secrets. The certificates it manages are defined by Kubernetes
Custom Resources, either defined manually or automatically for Ingress
resources with particular annotations.
The *cert-manager* deployment consists primarily of two services:
*cert-manager* itself, which monitors Kubernetes resources and manages
certificate requests, and the *cert-manager-webhook*, which validates
Kubernetes resources for *cert-manager*. There is also a third
component, *cainjector*, we do not need it.
The primary configuration for *cert-manager* is done through Issuer and
ClusterIssuer resources. These define how certificates are issued: the
certificate authority to use and how to handle ACME challenges. For our
purposes, we will be using ZeroSSL to issue certificates, verified via
the DNS.01 challenge through BIND running on the gateway firewall.
By default, Authelia requires the user to explicitly consent to allow
an application access to personal information *every time the user
authenticates*. This is rather annoying, so luckily, it provides a
way to remember the consent for a period of time.
For convenience, clients on the internal network do not need to
authenticate in order to access *scanserv-js*. There isn't anything
particularly sensitive about this application, anyway.
Enabling OpenID Connect authentication for the Kubernetes API server
will allow clients, particularly `kubectl` to log in without needing
TLS certificates and private keys.
*scanserv-js* blocks the HTTP request while waiting for a scan to
complete. For large, multi-page documents, the scan can take several
minutes. To prevent the request from timing out and interrupting the
scan, we need to increase the proxy timeout configuration.
The Canon PIXMA G7020 reports the supported dimensions of the flatbed,
but its automatic document feeder supports larger paper sizes.
Fortunately, *scanserv-js* provides a (somewhat kludgey) mechanism to
override the reported settings with more appropriate values.
We don't need to build our own container image anymore, since the new
*pyrocufflink.blue* domain controllers use LDAPS certificates signed by
Let's Encrypt.
*scanserv-js* is a web-based front-end for SANE. It allows scanning
documents from a browser.
Using the `config.local.js` file, we implement the `afterScan` hook to
automatically upload scanned files to *paperless-ngx* using its REST
API.
Authelia can act as an Open ID Connect identity provider. This allows
it to provide authentication/authorization for other applications
besides those inside the Kubernetes cluster using it for Ingress
authentication.
To start with, we'll configure an OIDC client for Jenkins.
I am not entirely sure why, but it seems like the Kubelet *always*
misses the first check in the readiness probe. This causes a full
60-second delay before the Authelia pod is marked as "ready," even
though it was actually ready within a second of the container starting.
To avoid this very long delay, during which Authelia is unreachable,
even though it is working fine, we can add a startup probe with a much
shorter check interval. The kubelet will not start readiness probes
until the startup probe returns successfully, so it won't miss the first
one any more.
Instead of using a static username/password and HTTP Basic
authentication for the Longhorn UI, we can now use Authelia via the
*nginx* auth subrequest functionality.
Authelia is a general authentication provider that works (primarily)
by integrating with *nginx* using its subrequest mechanism. It works
great with Kubernetes/*ingress-nginx* to provide authentication for
services running in the cluster, especially those that do not provide
their own authentication system.
Authelia needs a database to store session data. It supports various
engines, but since we're only running a very small instance with no real
need for HA, SQLite on a Longhorn persistent volume is sufficient.
Configuration is done mostly through a YAML document, although some
secret values are stored in separate files, which are pointed to by
environment variables.
*ntfy* allows notifications to include arbitrary file attachments. For
images, it will even show them in the notification. In order to support
this, the server must be configured with a writable filesystem location
to cache the files.
Version 0.2.0 of the HUD Controller is stateful. It requires writable
storage for its configuration file, as it updates the file when display
settings and screen URLs are changed.
While we're making changes, let's move it to its own namespace.
Kubernetes 1.24 introduced a new taint for Control Plane nodes that must
be tolerated in addition to the original taint in order for pods to be
scheduled to run on such nodes.
When cloning/fetching a Git repository in a Jenkins pipeline, the Git
Client plugin uses the configured *Host Key Verification Strategy* to
verify the SSH host key of the remote Git server. Unfortunately, there
does not seem to be any way to use the configured strategy from the
`git` command line in a Pipeline job, so e.g. `git push` does not
respect it. This causes jobs to fail to push changes to the remote if
the container they're using does not already have the SSH host key for
the remote in its known hosts database.
This commit adds a ConfigMap to the *jenkins-jobs* namespace that can be
mounted in containers to populate the SSH host key database.
I don't want Jenkins updating itself whenever the pod restarts, so I'm
going to pin it to a specific version. This way, I can be sure to take
a snapshot of the data volume before upgrading.
Setting a static SELinux level for the container allows CRI-O to skip
relabeling all the files in the persistent volume each time the
container starts. For this to work, the pod needs a special annotation,
and CRI-O itself has to be configured to respect it:
```toml
[crio.runtime.runtimes.runc]
allowed_annotations = ["io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel"]
```
This *dramatically* improves the start time of the Jenkins container.
Instead of taking 5+ minutes, it now starts instantly.
https://github.com/cri-o/cri-o/issues/6185#issuecomment-1334719982