alerts: Add alert for BURP RAID array swap

This alert counts how long its been since the number of "active" disks
in the RAID array on the BURP server has changed.  The assumption is
that the number will typically be `1`, but it will be `2` when the
second disk synchronized before the swap occurs.
btop
Dustin 2023-04-11 22:23:17 -05:00
parent 2394bf7436
commit dc2a05dc8f
1 changed files with 20 additions and 0 deletions

View File

@ -46,3 +46,23 @@ vmalert_rules:
expr: collectd_md_md_disks{type="missing", instance!~"burp.*"} != 0 expr: collectd_md_md_disks{type="missing", instance!~"burp.*"} != 0
- alert: mdraid failed disk - alert: mdraid failed disk
expr: collectd_md_md_disks{type="failed"} != 0 expr: collectd_md_md_disks{type="failed"} != 0
- name: BURP RAID
rules:
- alert: disks need swapped
expr:
time() - tlast_change_over_time(
(
collectd_md_md_disks{instance="burp1.pyrocufflink.blue", type="active"}
or last_over_time(collectd_md_md_disks{instance="burp1.pyrocufflink.blue", type="active"})[1d]
)[1d]
) > 86400 * 30
annotations:
summary: The disks in the BURP array need swapped
description: >-
The disks in the BURP RAID-1 (mirror) array should be swapped
periodically. One disk should be online and mounted while the other
is stored in the fireproof safe. Switching them ensures that even if
something happens to the active disk, such as hardware failure, power
surge, fire, or accidental `rm -rf`, the offline disk is only out of
date by a few weeks.