v-m/alerts: Fix PostgreSQL WAL archive failed alert
The `pg_stat_archiver_failed_count` metric is a counter, so once a WAL archival has failed, it will increase and never return to `0`. To ensure the alert is resolved once the WAL archival process recovers, we need to use the `increase` function to turn it into a gauge. Finally, we aggregate that gauge with `max_over_time` to keep the alert from flapping if the WAL archive occurs less frequently than the scrape interval.pull/50/head
parent
f637feba16
commit
dc835ddc9d
|
@ -185,7 +185,9 @@ groups:
|
|||
for: 10m
|
||||
- alert: WAL archive process failed
|
||||
expr: >-
|
||||
pg_stat_archiver_failed_count > 0
|
||||
max_over_time(
|
||||
increase(pg_stat_archiver_failed_count)[20m]
|
||||
)> 0
|
||||
annotations:
|
||||
summary: The archiver process failed for one or more WAL segments
|
||||
description: >-
|
||||
|
|
Loading…
Reference in New Issue