Files
kubernetes/victoria-metrics/alertmanager.config.yml
Dustin C. Hatch 8ecee4133f v-m/alerts: Rework free disk space alert
Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal
circumstances on aarch64 machines.  This is not a problem, because it
cleans up old files on its own, so we do not need to alert on it.
Unfortunately, the _DiskUsage_ alert is already quite complex, and
adding in exclusions for these devices would make it even worse.

To simplify the logic, we can use a recording rule to precomupte the
used/free space ratio.  By using `sum(...) without (type)` instead of
`sum(...) on (df, instance)`, we keep the other labels, which we can
then use to identify the metrics coming from machines we don't care to
monitor.

Instead of having different thresholds for different volumes
encoded in the same expression, we can use multiple alerts to alert on
"low" vs "very low" thresholds.  Since this will of course cause
duplicate alerts for most volumes, we can use AlertManager inhibition
rules to disable the "low" alert once the metric crosses the "very low"
threshold.
2024-11-02 09:38:02 -05:00

43 lines
771 B
YAML

global:
smtp_from: prometheus@pyrocufflink.blue
smtp_require_tls: false
smtp_smarthost: mail.pyrocufflink.blue:25
receivers:
- email_configs:
- send_resolved: true
to: gyrfalcon@ebonfire.com
name: default-email
- name: ntfy
webhook_configs:
- url: http://alertmanager-ntfy:8000/hook
- name: none
route:
group_by:
- '...'
receiver: ntfy
routes:
- receiver: none
matchers:
- alertname=Battery Low
- receiver: ntfy
matchers:
- alertname=DiskUsage
group_by:
- instance
- receiver: ntfy
matchers:
- alertgroup=Frigate
group_by:
- alertname
inhibit_rules:
- source_matchers:
- alertname=Free disk space is very low
target_matchers:
- alertname=Free disk space is low
equal:
- instance
- df