1
0
Fork 0

v-m/alerts: Improve Paperless-ngx Celery task alert

The `flower_events_total` metric is a counter, so its value only ever
increases (discounting restarts of the server process).  As such,
nonzero values do not necessarily indicate a _current_ problem, but
rather that there was one at some point in the past.  To identify
current issues, we need to use the `increase` function, and then apply
the `max_over_time` function so that the alert doesn't immediately reset
itself.
pull/32/head
Dustin 2024-11-01 18:00:50 -05:00
parent d12e66f58a
commit 3f9601dc94
1 changed files with 6 additions and 2 deletions

View File

@ -172,9 +172,13 @@ groups:
rules: rules:
- alert: Celery tasks failed - alert: Celery tasks failed
expr: >- expr: >-
flower_events_total{job="paperless-ngx", type="task-failed"} > 0 max_over_time(
increase(
flower_events_total{job="paperless-ngx", type="task-failed"}
)[24h]
) > 0
annotations: annotations:
summary: One or more Celery tasks have failed summary: Paperless-ngx Celery task failed
description: >- description: >-
Failing Celery tasks may indicate a problem with the Paperless-ngx Failing Celery tasks may indicate a problem with the Paperless-ngx
deployment and can result in data loss. Check the Paperless-ngx logs deployment and can result in data loss. Check the Paperless-ngx logs