infra

History

Dustin fd400eb1de home-assistant: Fix image refs for Zigbee/ZWaveJS The _updatebot_ has been running with an old configuration for a while, so while it was correctly identifying updates to ZWaveJS UI and Zigbee2MQTT, it was generating overrides for the incorrect OCI image names.		2025-09-14 15:47:31 -05:00
..
.gitignore	home-assistant: Add restart MQTTMarionette script	2024-08-23 09:24:46 -05:00
README.md	home-assistant: Manage YAML files with ConfigMap	2023-12-27 15:31:30 -06:00
configuration.yaml	home-assistant: Update IP kitchen kiosk IP address	2025-01-23 18:00:17 -06:00
event-snapshot.sh	home-assistant: Add (back) event-snapshot.sh	2023-12-28 17:09:01 -06:00
groups.yaml	home-assistant: Add time-to-go timer to watch view	2024-05-10 09:24:34 -05:00
hass-k8s.plantuml	home-assistant: Deploy Piper and Whisper	2023-08-02 20:13:45 -05:00
hass-k8s.svg	home-assistant: Deploy Piper and Whisper	2023-08-02 20:13:45 -05:00
home-assistant.yaml	h-a: Prefer running on a Raspberry Pi	2025-07-27 18:35:07 -05:00
ingress.yaml	home-assistant: Set instance label for Argo CD	2023-10-14 07:19:28 -05:00
kustomization.yaml	home-assistant: Fix image refs for Zigbee/ZWaveJS	2025-09-14 15:47:31 -05:00
migrate.yaml	home-assistant: Deploy Home Assistant	2023-07-24 17:53:58 -05:00
mosquitto-cert.yaml	home-assistant: Deploy Home Assistant	2023-07-24 17:53:58 -05:00
mosquitto.conf	home-assistant: Deploy Home Assistant	2023-07-24 17:53:58 -05:00
mosquitto.yaml	Revert "h-a: Schedule Piper, Whisper, Mosquitto with HA"	2025-08-07 10:26:37 -05:00
mqtt2vl.toml	home-assistant: Deploy mqtt2vl	2025-06-14 16:55:12 -05:00
mqtt2vl.yaml	home-assistant: Deploy mqtt2vl	2025-06-14 16:55:12 -05:00
namespace.yaml	home-assistant: Deploy Home Assistant	2023-07-24 17:53:58 -05:00
piper.yaml	h-a/{piper,whisper}: Prefer x86_64 nodes	2025-08-07 10:31:10 -05:00
postgres-cert.yaml	home-assistant: Use external PostgreSQL server	2024-07-02 18:16:05 -05:00
rest-command.yaml	home-assistant: Add commands to control photoframe	2024-06-26 18:29:49 -05:00
restart-diddy-mopidy.sh	home-assistant: Clean up restart_diddy_mopidy	2023-12-28 17:34:25 -06:00
restart-kitchen-mqttmarionette.sh	home-assistant: Add restart MQTTMarionette script	2024-08-23 09:24:46 -05:00
secrets.yaml	home-assistant: Deploy mqtt2vl	2025-06-14 16:55:12 -05:00
shell-command.yaml	home-assistant: Add service to shut down desk panel	2024-12-02 23:06:30 +00:00
shutdown-kiosk.sh	home-assistant: Add service to shut down desk panel	2024-12-02 23:06:30 +00:00
ssh_known_hosts	home-assistant: Add service to shut down desk panel	2024-12-02 23:06:30 +00:00
whisper.yaml	h-a/{piper,whisper}: Prefer x86_64 nodes	2025-08-07 10:31:10 -05:00
zigbee2mqtt.env	h-a: Config Zigbee2MQTT w/ env vars	2024-08-01 09:27:52 -05:00
zigbee2mqtt.yaml	h-a: Update taints for Zigbee/Zwave controllers	2025-07-29 21:39:21 -05:00
zwavejs2mqtt.yaml	h-a: Update taints for Zigbee/Zwave controllers	2025-07-29 21:39:21 -05:00

README.md

Home Assistant

Originally, I tried to keep the Home Assistant ecosystem completely self-contained. Every component ran on one Raspberry Pi. The thought was that this would make it more resilient, so that network or infrastructure problems would be less likely to affect smart home operations. Ultimately, it turns out this actually made it noticeably less resilient, as the Raspberry Pi became a single point of failure for the whole system.

When we moved to the new house, Home Assistant was unavailable for several days, as I did not have a way to power and run the Raspberry Pi. Since none of the smart home devices were installed yet, we initially did not think this was an issue. We had forgotten to think about the shopping list and the chore tracker, though, and how much we have come to rely on them.

Given how quickly and seamlessly the applications deployed in Kubernetes came back online after the move, it suddenly made sense to move Home Assistant there as well.

Ecosystem

The Home Assistant ecosystem consists of these components:

Home Assistant Core (API and Front-end)
PostgreSQL (State history database)
Mosquitto (MQTT server)
Zigbee2MQTT (Zigbee integration)
ZWaveJS2MQTT (ZWave integration)
Piper (Text-to-speech)
Whisper (Speech-to-text)

Each of these components runs in a container in separate pods within the home-assistant namespace.

Home Assistant Core

The core component of the Home Assistant ecosystem is the Home Assistant server itself. Only a single instance of the server can run within a given ecosystem, as Home Assistant is not cluster-aware. Home Assistant state is stored on the filesystem, so the server runs in a pod managed by a StatefulSet with a PersistentVolumeClaim.

The Home Assistant HTTP server, which hosts the UI, WebSocket, and REST API, is exposed by a Service resource, which in turn is proxied by an Ingress resource.

ConfigMaps

Although most Home Assistant configuration is managed by its web UI, some settings and integrations are read from manually-managed YAML files. Some notable examples include the Shell Command and Group integrations. To make it easier to edit these files, they are stored in a ConfigMap which is mounted into the Home Assistant container. Since the Kublet will not automatically update mounted ConfigMaps when files are mounted individually, the entire ConfigMap has to be mounted as a directory. Files that must exist within the configuration directory (i.e. /config) need symbolic links pointing to the respective files in the ConfigMap mount point.

PostgreSQL

Although Home Assistant stores all of its internal state in JSON files on the filesystem, it uses a relational SQL database for state history. This gives it the ability to chart historical values for e.g. sensors, as well as provide the Logbook view. By default, Home Assistant uses a SQLite database file, stored on the filesystem alongside the other state files, but it also supports other RDBMS engines, including PostgreSQL. Using PostgreSQL instead of SQLite has a few advantages:

More historical values can be retained without introducing performance issues
Events can be recorded immediately instead of batched
Backups and recovery are managed externally

PostgreSQL is not managed in directly in this deployment; rather, the Kustomization file patches the Home Assistant StatefulSet to provide environment variables pointing at an externally-managed PostgreSQL database. My Kubernetes cluster has a single PostgreSQL cluster, managed by the postgres operator, that hosts databases for several applications.

Mosquitto

Most of my custom integrations, including remote control of the heads-up displays, the chore list, and the Board Board™, are implemented using MQTT, as is Frigate. Thus, the Home Assistant ecosystem needs an MQTT message broker. Mosquitto is a lightweight but complete implementation, that works well with Home Assistant. It is extremely configurable, supporting various authentication, authorization, and access control mechanisms.

Home Assistant MQTT discovery relies heavily on retained MQTT messages, so enabling persistence for Mosquitto is very important. Without it, retained messages would be lost when the broker restarts, and all Home Assistant entities configured via MQTT discovery would be lost.

Since Mosquitto is not clustered and persists data to the filesystem, it is deployed as a StatefulSet with a PersistentVolumeClaim.

Zigbee2MQTT

Zigbee2MQTT provides a bridge between a Zigbee network and Home Assistant via MQTT. Zigbee devices communicate with the controller, which is attached to a server via USB. Messages received from devices are published to the message queue, and vice versa. Zigbee2MQTT stores its state on the filesystem, so the StatefulSet needs a PersistentVolumeClaim.

Zigbee2MQTT also exposes a web UI for configuration and administration of the Zigbee network. This UI is exposed by a Service and an Ingress, and protected by Authelia.

ZWaveJS2MQTT

Similar to Zigbee2MQTT, ZWaveJS2MQTT provides a bridge between a Z-Wave network and Home Assistant. While its name suggests it uses MQTT, this can actually be bypassed and Home Assistant can communicate directly with the ZWaveJS2MQTT server via a WebSocket connection.

ZWaveJS2MQTT has a web UI, which is exposed by a Service and an Ingress, protected by Authelia. It stores state on the filesystem, and thus requires a StatefulSet with a PersistentVolume Claim.

Piper/Whisper

Piper and Whisper provide the text-to-speech and speech-to-text capabilities, respectively, for Home Assistant Voice Control. These processes are designed to run as Add-Ons for Home Assistant OS, but work just fine as Kubernetes containers as well.

Piper and Whisper need mutable storage in order to download their machine learning models. Since the model data are downloaded automatically when the container starts, using ephemeral volumes is sufficient.

Raspberry Pi Node

While Home Assistant Core and Mosquitto can run on any node in the Kubernetes cluster, Zigbee2MQTT and ZWaveJS2MQTT obviously have to run on the node where their respective devices are attached. Originally, I had intended to run them as containers on a Raspberry Pi, managed by Podman. While I was setting this up, though, it occurred to me that that was not even necessary; Kubernetes has all the necessary functionality to run containers on a specific node and enable them to communicate with local hardware.

To that end, I have added a Raspberry Pi running Fedora CoreOS to the k8s cluster and attached the Zigbee and Z-Wave radios to it. This node has two special labels: node-role.kubernetes.io/zigbee-ctrl and node-role.kubernetes.io/zwave-ctrl, indicating that it has the Zigbee and Z-Wave controllers, respectively, attached to it. The Zigbee2MQTT and ZWaveJS2MQTT pods have node selectors that match these labels, ensuring that they are only scheduled on the correct node.

Since my Kubernetes cluster uses Longhorn for storage management, which exposes volumes to pods via iSCSI, no state is actually stored on the Raspberry Pi.

To prevent pods besides Zigbee2MQTT and ZWaveJS2MQTT from being scheduled on the Raspberry Pi, it has a du5t1n.me/machine=raspberrypi:NoExecute taint. The Zigbee2MQTT and ZWaveJS2MQTT pods, as well as critical services that are deployed on every node in the cluster via DaemonSet resources, such as Calico and Longhorn, are configured with a toleration for this taint. All other pods, which do not have such a toleration, will never be scheduled on this node.