1
0
Fork 0
Commit Graph

5 Commits (7dffb5195a5e82aa4839a26cc906cf96e2174765)

Author SHA1 Message Date
Dustin 7dffb5195a v-m: alertmanager: Group disk usage alerts
Some machines have the same volume mounted multiple times (e.g.
container hosts, BURP).  Alerts will fire for all of these
simultaneously when the filesystem usage passes the threshold.  To avoid
getting spammed with a bunch of messages about the same filesystem,
we'll group alerts from the same machine.
2024-08-17 10:59:05 -05:00
Dustin 8113e5a47f v-m: Fix syntax in AlertManager config
The `group_by` field takes a list of label names, rather than a single
string.
2024-07-06 07:13:27 -05:00
Dustin 952ab9f264 v-m: alertmanager: Group camera notifications
When Frigate is down, multiple alerts are generated for each camera, as
Home Assistant creates camera entities for each tracked object.  This is
extremely annoying, not to mention unnecessary.  To address this, we'll
configure AlertManager to send a single notification for alerts in the
group.
2024-07-05 07:30:30 -05:00
Dustin d74e26d527 victoria-metrics: Send alerts via ntfy
I don't like having alerts sent by e-mail.  Since I don't get e-mail
notifications on my watch, I often do not see alerts for quite some
time.  They are also much harder to read in an e-mail client (Fastmail
web an K-9 Mail both display them poorly).  I would much rather have
them delivered via _ntfy_, just like all the rest of the ephemeral
notifications I receive.

Fortunately, it is easy enough to integrate Alertmanager and _ntfy_
using the webhook notifier in Alertmanager.  Since _ntfy_ does not
natively support the Alertmanager webhook API, though, a bridge is
necessary to translate from one data format to the other.  There are a
few options for this bridge, but I chose
[alexbakker/alertmanager-ntfy][0] because it looked the most complete
while also having the simplest configuration format.  Sadly, it does not
expose any Prometheus metrics itself, and since it's deployed in the
_victoria-metrics_ namespace, it needs to be explicitly excluded from
the VMAgent scrape configuration.

[0]: https://github.com/alexbakker/alertmanager-ntfy
2024-05-10 10:32:52 -05:00
Dustin 8f088fb6ae v-m: Deploy (clustered) Victoria Metrics
Since *mtrcs0.pyrocufflink.blue* (the Metrics Pi) seems to be dying,
I decided to move monitoring and alerting into Kubernetes.

I was originally planning to have a single, dedicated virtual machine
for Victoria Metrics and Grafana, similar to how the Metrics Pi was set
up, but running Fedora CoreOS instead of a custom Buildroot-based OS.
While I was working on the Ignition configuration for the VM, it
occurred to me that monitoring would be interrupted frequently, since
FCOS updates weekly and all updates require a reboot.  I would rather
not have that many gaps in the data.  Ultimately I decided that
deploying a cluster with Kubernetes would probably be more robust and
reliable, as updates can be performed without any downtime at all.

I chose not to use the Victoria Metrics Operator, but rather handle
the resource definitions myself.  Victoria Metrics components are not
particularly difficult to deploy, so the overhead of running the
operator and using its custom resources would not be worth the minor
convenience it provides.
2024-01-01 17:48:10 -06:00