kubernetes/home-assistant/README.md

# Home Assistant

Originally, I tried to keep the Home Assistant ecosystem completely
self-contained.  Every component ran on one Raspberry Pi.  The thought
was that this would make it more resilient, so that network or infrastructure
problems would be less likely to affect smart home operations.  Ultimately, it
turns out this actually made it noticeably *less* resilient, as the Raspberry
Pi became a single point of failure for the whole system.

When we moved to the new house, Home Assistant was unavailable for several
days, as I did not have a way to power and run the Raspberry Pi.  Since none of
the smart home devices were installed yet, we initially did not think this was
an issue.  We had forgotten to think about the shopping list and the chore
tracker, though, and how much we have come to rely on them.

Given how quickly and seamlessly the applications deployed in Kubernetes came
back online after the move, it suddenly made sense to move Home Assistant there
as well.


## Ecosystem

The Home Assistant ecosystem consists of these components:

* Home Assistant Core (API and Front-end)
* PostgreSQL (State history database)
* Mosquitto (MQTT server)
* Zigbee2MQTT (Zigbee integration)
* ZWaveJS2MQTT (ZWave integration)
* Piper (Text-to-speech)
* Whisper (Speech-to-text)

Each of these components runs in a container in separate pods within the
*home-assistant* namespace.

![Component Diagram](hass-k8s.svg)


### Home Assistant Core

The core component of the Home Assistant ecosystem is the [Home Assistant]
server itself.  Only a single instance of the server can run within a given
ecosystem, as Home Assistant is not cluster-aware.  Home Assistant state is
stored on the filesystem, so the server runs in a pod managed by a StatefulSet
with a PersistentVolumeClaim.

The Home Assistant HTTP server, which hosts the UI, WebSocket, and REST API, is
exposed by a Service resource, which in turn is proxied by an Ingress resource.

[Home Assistant]: https://www.home-assistant.io/


#### ConfigMaps

Although most Home Assistant configuration is managed by its web UI, some
settings and integrations are read from manually-managed YAML files.  Some
notable examples include the [Shell Command] and [Group] integrations.  To make
it easier to edit these files, they are stored in a ConfigMap which is mounted
into the Home Assistant container.  Since the Kublet will not automatically
update mounted ConfigMaps when files are mounted individually, the entire
ConfigMap has to be mounted as a directory.  Files that must exist within the
configuration directory (i.e. `/config`) need symbolic links pointing to the
respective files in the ConfigMap mount point.

[Shell Command]: https://www.home-assistant.io/integrations/shell_command
[Group]: https://www.home-assistant.io/integrations/group


### PostgreSQL

Although Home Assistant stores all of its internal state in JSON files on the
filesystem, it uses a relational SQL database for state history.  This gives it
the ability to chart historical values for e.g. sensors, as well as provide the
Logbook view.  By default, Home Assistant uses a SQLite database file, stored
on the filesystem alongside the other state files, but it also supports other
RDBMS engines, including PostgreSQL.  Using PostgreSQL instead of SQLite has
a few advantages:

* More historical values can be retained without introducing performance issues
* Events can be recorded immediately instead of batched
* Backups and recovery are managed externally

PostgreSQL is _not_ managed in directly in this deployment; rather, the
Kustomization file patches the Home Assistant StatefulSet to provide
environment variables pointing at an externally-managed PostgreSQL database.
My Kubernetes cluster has a single PostgreSQL cluster, managed by the [postgres
operator], that hosts databases for several applications.

[postgres operator]: https://github.com/zalando/postgres-operator/


### Mosquitto

Most of my custom integrations, including remote control of the heads-up
displays, the chore list, and the Board Board™, are implemented using MQTT, as
is Frigate.  Thus, the Home Assistant ecosystem needs an MQTT message broker.
[Mosquitto] is a lightweight but complete implementation, that works well with
Home Assistant.  It is extremely configurable, supporting various
authentication, authorization, and access control mechanisms.

Home Assistant MQTT discovery relies heavily on retained MQTT messages, so
enabling persistence for Mosquitto is very important.  Without it, retained
messages would be lost when the broker restarts, and all Home Assistant
entities configured via MQTT discovery would be lost.

Since Mosquitto is not clustered and persists data to the filesystem, it is
deployed as a StatefulSet with a PersistentVolumeClaim.

[Mosquitto]: https://mosquitto.org/


### Zigbee2MQTT

[Zigbee2MQTT] provides a bridge between a Zigbee network and Home Assistant via
MQTT.  Zigbee devices communicate with the controller, which is attached to a
server via USB.  Messages received from devices are published to the message
queue, and vice versa. Zigbee2MQTT stores its state on the filesystem, so the
StatefulSet needs a PersistentVolumeClaim.

Zigbee2MQTT also exposes a web UI for configuration and administration of the
Zigbee network.  This UI is exposed by a Service and an Ingress, and protected
by [Authelia].

[Zigbee2MQTT]: https://zigbee2mqtt.io/
[Authelia]: https://authelia.com/


### ZWaveJS2MQTT

Similar to Zigbee2MQTT, [ZWaveJS2MQTT] provides a bridge between a Z-Wave
network and Home Assistant.  While its name suggests it uses MQTT, this can
actually be bypassed and Home Assistant can communicate directly with the
ZWaveJS2MQTT server via a WebSocket connection.

ZWaveJS2MQTT has a web UI, which is exposed by a Service and an Ingress,
protected by Authelia.  It stores state on the filesystem, and thus requires a
StatefulSet with a PersistentVolume Claim.

[ZWaveJS2MQTT]: https://github.com/zwave-js/zwavejs2mqtt/


### Piper/Whisper

[Piper] and [Whisper] provide the text-to-speech and speech-to-text
capabilities, respectively, for Home Assistant [Voice Control].  These
processes are designed to run as Add-Ons for Home Assistant OS, but work just
fine as Kubernetes containers as well.

Piper and Whisper need mutable storage in order to download their machine
learning models.  Since the model data are downloaded automatically when the
container starts, using ephemeral volumes is sufficient.

[Piper]: https://github.com/rhasspy/piper
[Whisper]: https://github.com/guillaumekln/faster-whisper/
[Voice Control]: https://www.home-assistant.io/voice_control/


## Raspberry Pi Node

While Home Assistant Core and Mosquitto can run on any node in the Kubernetes
cluster, Zigbee2MQTT and ZWaveJS2MQTT obviously have to run on the node where
their respective devices are attached.  Originally, I had intended to run them
as containers on a Raspberry Pi, managed by Podman.  While I was setting this
up, though, it occurred to me that that was not even necessary; Kubernetes has
all the necessary functionality to run containers on a specific node and enable
them to communicate with local hardware.

To that end, I have added a Raspberry Pi running [Fedora CoreOS] to the k8s
cluster and attached the Zigbee and Z-Wave radios to it.  This node has two
special labels: `node-role.kubernetes.io/zigbee-ctrl` and
`node-role.kubernetes.io/zwave-ctrl`, indicating that it has the Zigbee and
Z-Wave controllers, respectively, attached to it.  The Zigbee2MQTT and
ZWaveJS2MQTT pods have node selectors that match these labels, ensuring that
they are only scheduled on the correct node.

Since my Kubernetes cluster uses Longhorn for storage management, which exposes
volumes to pods via iSCSI, no state is actually stored on the Raspberry Pi.

To prevent pods besides Zigbee2MQTT and ZWaveJS2MQTT from being scheduled on
the Raspberry Pi, it has a `du5t1n.me/machine=raspberrypi:NoExecute` [taint].
The Zigbee2MQTT and ZWaveJS2MQTT pods, as well as critical services that are
deployed on every node in the cluster via DaemonSet resources, such as [Calico]
and [Longhorn], are configured with a toleration for this taint.  All other
pods, which do not have such a toleration, will never be scheduled on this
node.

[Fedora CoreOS]: https://www.fedoraproject.org/coreos/
[taint]: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
[Calico]: https://www.tigera.io/project-calico/
[Longhorn]: https://longhorn.io