[Sealed Secrets] will allow us to store secret values in the Git
repository, since the actual secrets are encrypted and can only be
decrypted using the private key stored in the Kubernetes cluster.
I have been looking for a better way to deal with secrets for some time
now. For one thing, having the secret files ignored by Git means they
only exist on my main desktop. If I need to make changes to an
application from another machine, I have to not only clone the
repository, but also manually copy the secret files. That sort of
makes my desktop a single point-of-failure. I tried moving all the
secret files to another (private) repository and adding it as a
submodule, but Kustomize did not like that; it will only load files from
the current working directory, or another Kustomize project. Having to
create two projects for each application, one for the secrets and one
for everything else, would be tedious and annoying. I also considered
encrypting all the secret files with e.g. GnuPG and creating Make
recipies for each project to decrypt them before running `kubectl
apply`. I eventually want to use Argo CD, though, so that prerequisite
step would make that a lot more complex. Eventually, I discovered
[KSOPS] and *Sealed Secrets*. KSOPS operates entirely on the client
side, and thus requires a plugin for Kustomize and/or Argo CD in order
to work, so it's not significantly different than the GnuPG/Make idea.
I like that Sealed Secrets does not require anything on the client side,
except when initially creating the manifests for the SealedSecret
objects, so Argo CD will "just work" without any extra tools or
configuration.
[Sealed Secrets]: https://github.com/bitnami-labs/sealed-secrets
[KSOPS]: https://github.com/viaduct-ai/kustomize-sops
The other day, when I was dealing with the mess that I accidentally
created by letting the *phpipam* MySQL database automaticall upgrade
itself, I attempted to restore from a Longhorn backup to try to get the
database working again. This did work, but as a side-effect, it changed
the storage class name of the *phpipam-pvc* persistent volume claim from
`longhorn` to `longhorn-static`. Now, when attempting to apply the
YAML manifest, `kubectl` complains because this field is immutable. As
such, the manifest needs to be updated to reflect the value set by
Longhorn when the backup was restored and the PVC was recreated.
The *fuse-device-plugin* handles mapping the `/dev/fuse` device into
unprivileged containers, e.g. for `buildah`.
Although *fuse-device-plugin* was recommended by Red Hat in their
blog post [How to use Podman inside of Kubernetes][0], it's probably
not the best choice any more. It's working for now, giving me the
ability to build container images in Kubernetes without running
`buildah` in a privileged container, but I will probably investigate
replacing it with the [generic-device-plugin][1] eventually.
[0]: https://www.redhat.com/sysadmin/podman-inside-kubernetes
[1]: https://github.com/squat/generic-device-plugin
The *dch-webhooks* tool now provides an operation for hosts to request a
signed SSH certificate from the SSH CA. It's primarily useful for
unattended deployments like CoreOS Ignition, where hosts do not have
any credentials to authenticate with the CA directly.
[Step CA] is an open-source online X.509 and SSH certificate authority
service. It supports issuing certificates via various protocols,
including ACME and its own HTTP API via the `step` command-line utility.
Clients can authenticate using a variety of methods, such as JWK, Open
ID Connect, or mTLS. This makes it very flexible and easy to introduce
to an existing ecosystem.
Although the CA service is mostly stateless, it does have an on-disk
database where stores some information, notably the list of SSH hosts
for which it has signed certificates. Most other operations, though, do
not require any persistent state; the service does not keep track of
every single certificate it signed, for example. It can be configured
to store authentication information (referred to as "provisioners") in
the database instead of the configuration file, by enabling the "remote
provisioner management" feature. This has the advantage of being able
to modify authentication configuration without updating a Kubernetes
ConfigMap and restarting the service.
The official Step CA documentation recommends using the `step ca init`
command initialize a new certificate authority. This command performs a
few steps:
* Generates an ECDSA key pair and uses it to create a self-signed root
certificate
* Generates a second ECDSA key pair and signs an intermediate CA
certificate using the root CA key
* Generates an ECDSA key pair and SSH root certificate
* Creates a `ca.json` configuration file
These steps can be performed separately, and in fact, I created the
intermediate CA certificate and signed it with the (offline) *dch Root
CA* certificate.
When the service starts for the first time, because
`authority/enableAdmin` is `true` and `authority/provisioners` is empty,
a new "Admin JWK" provisioner will be created automatically. This key
will be encrypted with the same password used to encrypt the
intermediate CA certificate private key, and can be used to create other
provisioners.
[Step CA]: https://smallstep.com/docs/step-ca/
phpIPAM supports "Apache authentication" which effectively delegates
authentication to the web server and trusts the `PHP_AUTH_USER` server
variable. This variable is usually set by an Apache authentication
module, but it can be set manually in the config. Here, we're using
`SetEnvIf` to populate it from the value of the `Remote-User` header
set by Authelia.
Using the *latest* tag for MariaDB is particularly problematic, as a
new version of the container may be pulled when the pod is scheduled on
a different host. MariaDB will not start in this case, as it recognizes
that the data on disk need to be upgraded.
To prevent database outages in situations like this, we need to pin to a
specific version of MariaDB, ensuring that every pod runs the same
version.
Having the Z-Wave and Zigbee admin interfaces exposed as sub-paths under
*homeassistant.pyrocufflink.blue* made it difficult to use Authelia.
Since I have a Firefox container tab specifically for Home Assistant,
the login redirect would open a new tab in a different container, since
Authelia is hosted at *auth.pyrocufflink.blue*. In order to log in, I
would have to temporarily disable "designated sites only" for the Home
Assistant tab container. Using subdomains for the admin interfaces
avoids this issue, since I can use a different container for them, one
that does not have the "designated sites only" setting, since I am less
worried about accidentally leaking data to sites on the Internet from
them.
Piper is the new text-to-speech service for Home Assistant. Whisper is
a speech-to-text service. Together, these services, which communicate
with Home Assistant via the Wyoming protocol, provide the speech
interface to the new Home Assistant Voice Assistant feature.
This commit adds resources for deploying the Home Assistant ecosystem
inside Kubernetes. Home Assistant itself, as well as Mosquitto, are
just normal Pods, managed by StatefulSets, that can run anywhere.
ZWaveJS2MQTT and Zigbee2MQTT, on the other hand, have to run on a
special node (a Raspberry Pi), where the respective controllers are
attached.
The Home Assistant UI is exposed externally via an Ingress resource.
The MQTT broker is also exposed externally, using the TCP proxy feature
of *ingress-nginx*. Additionally, the Zigbee2MQTT and ZWaveJS2MQTT
control panels are exposed via Ingress resources, but these are
protected by Authelia.
Hatch Learning Center has several domains; Tabitha couldn't decide which
she liked best :) At present, they all resolve to the same website, with
_hatchlearningcenter.org_ as the canonical name.
The *dch-webhooks* service is a generic tool I've written to handle
various automation flows. For now, it only has one feature: when a
transaction is created in Firefly-III, it searches Paperless-ngx for a
matching receipt, and if found, attaches it to the transaction.
If I remember to add the _acme-challenge CNAME record *before* applying
the Certificate resource, it takes a little under 5 minutes to issue a
new certificate.
Apparently, *Firefly III* thinks it is a good idea to send an email to
the administrator every time it encounters an error. This is
particularly annoying when doing database maintenance, as the Kubernetes
health checks trigger an error every minute, which *Firefly III*
helpfully notifies me about.
Fortunately, this behavior can be disabled.
[Firefly III][0] is a free and open source, web-based personal finance
management application. It features a double-entry bookkeeping system
for tracking transactions, plus other classification options like
budgets, categories, and tags. It has a rule engine that can
automatically manipulate transactions, plus several other really useful
features.
The application itself is mostly standard browser-based GUI written in
PHP. There is an official container image, though it is not
particularly well designed and must be run as root (it does drop
privileges before launching the actual application, thankfully). I may
decide to create a better image later.
Along with the main application, there is a separate tool for importing
transactions from a CSV file. Its design is rather interesting: though
it is a web-based application, it does not have any authentication or
user management, but uses a user API key to access the main Firefly III
application. This effectively requires us to have one instance of the
importer per user. While not ideal, it isn't particularly problematic
since there are only two of us (and Tabitha may not even end up using
it; she seems to like YNAB).
[0]: https://www.firefly-iii.org/
While I was preparing to deploy PostgreSQL for Firefly III, I was
thinking it would be a neat idea to write an operator that uses
custom resources to manage PostgreSQL roles and databases. Then I
though, surely something like that must exist already. As it turns out,
the [Postgres Operator][0] does exactly that, and a whole lot more.
The *Postgres Operator* handles deploying PostgreSQL server instances,
including primary/standby replication with load balancers. It uses
custom resources to manage the databases and users (roles) in an
instance, and stores role passwords in Secret resources. It supports
backing up instances using `pg_basebackup` and WAL archives (i.e.
physical backups) via [WAL-E][1]/[WAL-G][2]. While various backup
storage targets are supported, *Postgres Operator* really only works
well with the cloud storage services like S3, Azure, and Google Cloud
Platform. Fortunately, S3-compatible on-premises solutions like MinIO
are just fine.
I think for my use cases, a single PostgreSQL cluster with multiple
databases will be sufficient. I know *Firefly III* will need a
PostgreSQL database, and I will likely want to migrate *Paperless-ngx*
to PostgreSQL eventually too. Having a single instance will save on
memory resources, at the cost of per-application point-in-time recovery.
For now, just one server in the cluster is probably sufficient, but
luckily adding standby servers appears to be really easy should the need
arise.
[0]: https://postgres-operator.readthedocs.io/en/latest/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
This configuration is for the instance of MinIO running on the BURP
server, which will be used to store PostgreSQL backups created by the
Postgres Operator.
Using *acme-dns.io* is incredibly cumbersome. Since each unique
subdomain requires its own set of credentials, the `acme-dns.json` file
has to be updated every time a new certificate is added. This
effectively precludes creating certificates via Ingress annotations.
As Cloudflare's DNS service is free and anonymous as well, I thought I
would try it out as an alternative to *acme-dns.io*. It seems to work
well so far. One potential issue, though, is Cloudflare seems to have
several nameservers, with multiple IP addresses each. This may require
adding quite a few exceptions to the no-outbound-DNS rule on the
firewall. I tried using the "recursive servers only" mode of
*cert-manager*, however, as expected, the recursive servers all cache
too aggressively. Since the negative cache TTL value in the SOA record
for Cloudflare DNS zones is set to 1 hour and cannot be configured, ACME
challenges can take at least that long in this mode. Thus, querying the
authoritative servers directly is indeed the best option, even though it
violates the no-outbound-DNS rule.
Using the local name server as the authoritative server for ACME
challenge records turned out to be quite problematic. For some reason,
both Google and Cloudflare kept returning SERVFAIL responses for the
*_acme-challenge* TXT queries. I suspect this may have had something to
do with how BIND was configured to be the authoritative server for the
*o-ak4p9kqlmt5uuc.com* while also being a recusive resolver for clients
on the local network.
Using *acme-dns.io* resolves these issues, but it does bring a few of
its own. Notably, each unique domain and subdomain must have its own
set of credentials (specified in the `acme-dns.json`) file. This makes
adding new certificates rather cumbersome.
The `cert-exporter` tool fetches certificates from Kubernetes Secret
resources and commits them to a Git repository. This allows
certificates managed by *cert-manager* to be used outside the Kubernetes
cluster, e.g. for services running on other virtual machines.
The wildcard certificate for the *pyrocufflink.net* and
*pyrocufflink.blue* domains is now handled by *cert-manager* and saved
to *certs.git* by `cert-exporter.
*cert-manager* manages certificates. More specifically, it is an ACME
client, which generates certificate-signing requests, submits them to a
certificate authority, and stores the signed certificate in Kubernetes
secrets. The certificates it manages are defined by Kubernetes
Custom Resources, either defined manually or automatically for Ingress
resources with particular annotations.
The *cert-manager* deployment consists primarily of two services:
*cert-manager* itself, which monitors Kubernetes resources and manages
certificate requests, and the *cert-manager-webhook*, which validates
Kubernetes resources for *cert-manager*. There is also a third
component, *cainjector*, we do not need it.
The primary configuration for *cert-manager* is done through Issuer and
ClusterIssuer resources. These define how certificates are issued: the
certificate authority to use and how to handle ACME challenges. For our
purposes, we will be using ZeroSSL to issue certificates, verified via
the DNS.01 challenge through BIND running on the gateway firewall.
By default, Authelia requires the user to explicitly consent to allow
an application access to personal information *every time the user
authenticates*. This is rather annoying, so luckily, it provides a
way to remember the consent for a period of time.