kubernetes

infra

Author	SHA1	Message	Date
Dustin	7dffb5195a	v-m: alertmanager: Group disk usage alerts Some machines have the same volume mounted multiple times (e.g. container hosts, BURP). Alerts will fire for all of these simultaneously when the filesystem usage passes the threshold. To avoid getting spammed with a bunch of messages about the same filesystem, we'll group alerts from the same machine.	2024-08-17 10:59:05 -05:00
Dustin	8113e5a47f	v-m: Fix syntax in AlertManager config The `group_by` field takes a list of label names, rather than a single string.	2024-07-06 07:13:27 -05:00
Dustin	952ab9f264	v-m: alertmanager: Group camera notifications When Frigate is down, multiple alerts are generated for each camera, as Home Assistant creates camera entities for each tracked object. This is extremely annoying, not to mention unnecessary. To address this, we'll configure AlertManager to send a single notification for alerts in the group.	2024-07-05 07:30:30 -05:00
Dustin	d74e26d527	victoria-metrics: Send alerts via ntfy I don't like having alerts sent by e-mail. Since I don't get e-mail notifications on my watch, I often do not see alerts for quite some time. They are also much harder to read in an e-mail client (Fastmail web an K-9 Mail both display them poorly). I would much rather have them delivered via _ntfy_, just like all the rest of the ephemeral notifications I receive. Fortunately, it is easy enough to integrate Alertmanager and _ntfy_ using the webhook notifier in Alertmanager. Since _ntfy_ does not natively support the Alertmanager webhook API, though, a bridge is necessary to translate from one data format to the other. There are a few options for this bridge, but I chose [alexbakker/alertmanager-ntfy][0] because it looked the most complete while also having the simplest configuration format. Sadly, it does not expose any Prometheus metrics itself, and since it's deployed in the _victoria-metrics_ namespace, it needs to be explicitly excluded from the VMAgent scrape configuration. [0]: https://github.com/alexbakker/alertmanager-ntfy	2024-05-10 10:32:52 -05:00
Dustin	8f088fb6ae	v-m: Deploy (clustered) Victoria Metrics Since mtrcs0.pyrocufflink.blue (the Metrics Pi) seems to be dying, I decided to move monitoring and alerting into Kubernetes. I was originally planning to have a single, dedicated virtual machine for Victoria Metrics and Grafana, similar to how the Metrics Pi was set up, but running Fedora CoreOS instead of a custom Buildroot-based OS. While I was working on the Ignition configuration for the VM, it occurred to me that monitoring would be interrupted frequently, since FCOS updates weekly and all updates require a reboot. I would rather not have that many gaps in the data. Ultimately I decided that deploying a cluster with Kubernetes would probably be more robust and reliable, as updates can be performed without any downtime at all. I chose not to use the Victoria Metrics Operator, but rather handle the resource definitions myself. Victoria Metrics components are not particularly difficult to deploy, so the overhead of running the operator and using its custom resources would not be worth the minor convenience it provides.	2024-01-01 17:48:10 -06:00

5 Commits (7dffb5195a5e82aa4839a26cc906cf96e2174765)