1
0
Fork 0

Compare commits

..

9 Commits

Author SHA1 Message Date
bot ee9af76929 zwavejs2mqtt: Update to 9.26.0 2024-11-02 11:32:11 +00:00
bot 76c182f758 zigbee2mqtt: Update to 1.41.0 2024-11-02 11:32:11 +00:00
bot 0ee2952095 home-assistant: Update to 2024.10.4 2024-11-02 11:32:11 +00:00
Dustin 4cef41688f v-m/alerts: Add Zigbee+ZWave network alerts 2024-11-01 18:14:56 -05:00
Dustin 6cf11f9f61 v-m: Scrape HAProxy 2024-11-01 18:14:37 -05:00
Dustin 7a768cbb76 v-m: Update jobs for new Loki server
*loki1.pyrocufflink.blue* is a regular Fedora machine, a member of the
AD domain, and managed by Ansible.  Thus, it does not need to be
explicitly listed as a scrape target.

For scraping metrics from Loki itself, I've changed the job to use
DNS-SD because it seems like `vmagent` does _not_ re-resolve host names
from static configuration.
2024-11-01 18:07:34 -05:00
Dustin 0101040634 v-m/alerts: Add Paperless-ngx email task alert
This alert should fire if the background task to fetch e-mail and import
them into Paperless-ngx has not run for a while.
2024-11-01 18:04:06 -05:00
Dustin 3f9601dc94 v-m/alerts: Improve Paperless-ngx Celery task alert
The `flower_events_total` metric is a counter, so its value only ever
increases (discounting restarts of the server process).  As such,
nonzero values do not necessarily indicate a _current_ problem, but
rather that there was one at some point in the past.  To identify
current issues, we need to use the `increase` function, and then apply
the `max_over_time` function so that the alert doesn't immediately reset
itself.
2024-11-01 18:00:50 -05:00
Dustin d12e66f58a v-m: Scrape Frigate exporter 2024-11-01 17:47:51 -05:00
3 changed files with 88 additions and 11 deletions

View File

@ -123,8 +123,8 @@ images:
- name: docker.io/rhasspy/wyoming-piper - name: docker.io/rhasspy/wyoming-piper
newTag: 1.5.0 newTag: 1.5.0
- name: docker.io/koenkk/zigbee2mqtt - name: docker.io/koenkk/zigbee2mqtt
newTag: 1.40.2 newTag: 1.41.0
- name: docker.io/zwavejs/zwave-js-ui - name: docker.io/zwavejs/zwave-js-ui
newTag: 9.25.0 newTag: 9.26.0
- name: docker.io/library/eclipse-mosquitto - name: docker.io/library/eclipse-mosquitto
newTag: 2.0.20 newTag: 2.0.20

View File

@ -68,18 +68,48 @@ groups:
rules: rules:
- alert: Frigate is Unavailable - alert: Frigate is Unavailable
expr: expr:
homeassistant_entity_available{entity=~".*frigate_(server|status)"} != 1 absent(frigate_service_info)
or irate(frigate_service_last_updated_timestamp) < 1
or irate(frigate_service_uptime_seconds) < 1
for: 10m for: 10m
- alert: Camera unavailable - alert: Camera unavailable
expr: expr:
homeassistant_entity_available{domain="camera"} != 1 homeassistant_entity_available{domain="camera"} != 1
for: 10m for: 10m
- name: Sensors - name: Home Assistant
rules: rules:
- alert: Battery Low - alert: Battery Low
expr: expr:
homeassistant_sensor_battery_percent{entity!~"sensor\\.(pixel_|sm_p610).*"} < 10 homeassistant_sensor_battery_percent{entity!~"sensor\\.(pixel_|sm_p610).*"} < 10
annotations:
summary: >-
Low battery: {{ $labels.friendly_name }}
severity: minor
- alert: Z-Wave Network is Offline
expr:
sum(
homeassistant_entity_available{entity="sensor.usb_controller_status"}
) without (
friendly_name
) < 1
annotations:
summary: The Z-Wave network controller is offline
description: >-
Home Assistant is not able to communicate with ZWaveJS, or ZWaveJS is
not able to connect to the Z-Wave USB controller. Z-Wave devices like
light switches, door sensors, and smart plugs will not work until the
Z-Wave network is operational again.
- alert: Zigbee Network is Offline
expr:
homeassistant_binary_sensor_state{entity="binary_sensor.zigbee2mqtt_bridge_connection_state"} == 0
annotations:
summary: The Zigbee network bridge is offline
description: >-
Home Assistant is not able to communicate with Zigbee2MQTT, or
Zigbee2MQTT is not able to connect to the Z-Wave USB controller.
Zigbee devices like smart bulbs and buttons will not work until the
Zigbee network is operational again.
- name: PostgreSQL - name: PostgreSQL
rules: rules:
@ -170,10 +200,28 @@ groups:
rules: rules:
- alert: Celery tasks failed - alert: Celery tasks failed
expr: >- expr: >-
flower_events_total{job="paperless-ngx", type="task-failed"} > 0 max_over_time(
increase(
flower_events_total{job="paperless-ngx", type="task-failed"}
)[24h]
) > 0
annotations: annotations:
summary: One or more Celery tasks have failed summary: Paperless-ngx Celery task failed
description: >- description: >-
Failing Celery tasks may indicate a problem with the Paperless-ngx Failing Celery tasks may indicate a problem with the Paperless-ngx
deployment and can result in data loss. Check the Paperless-ngx logs deployment and can result in data loss. Check the Paperless-ngx logs
for details about the task failures. for details about the task failures.
- alert: Paperless email task not running
expr: >-
absent(
flower_events_total{
type="task-started",
task="paperless_mail.tasks.process_mail_accounts"
}
)
annotations:
summary: Paperless task to process mail accounts has not run recently
description: >-
Paperless-ngx uses a scheduled Celery task to periodically poll email
mailboxes for new messages. If this task does not start, new email
messages will not be downloaded and imported into the document library.

View File

@ -76,7 +76,6 @@ scrape_configs:
static_configs: static_configs:
- targets: - targets:
- gw1.pyrocufflink.blue - gw1.pyrocufflink.blue
- loki0.pyrocufflink.blue
- nut0.pyrocufflink.blue - nut0.pyrocufflink.blue
- nvr2.pyrocufflink.blue - nvr2.pyrocufflink.blue
- unifi3.pyrocufflink.blue - unifi3.pyrocufflink.blue
@ -251,7 +250,6 @@ scrape_configs:
metrics_path: /bridge?selector=zincati metrics_path: /bridge?selector=zincati
static_configs: static_configs:
- targets: - targets:
- loki0.pyrocufflink.blue
- nut0.pyrocufflink.blue - nut0.pyrocufflink.blue
- unifi3.pyrocufflink.blue - unifi3.pyrocufflink.blue
kubernetes_sd_configs: kubernetes_sd_configs:
@ -279,14 +277,21 @@ scrape_configs:
scheme: https scheme: https
tls_config: tls_config:
ca_file: /run/dch-ca/dch-root-ca.crt ca_file: /run/dch-ca/dch-root-ca.crt
static_configs: dns_sd_configs:
- targets: - names:
- loki.pyrocufflink.blue - loki.pyrocufflink.blue
type: A
port: 443
relabel_configs:
- source_labels: [__meta_dns_name, __meta_dns_srv_record_port]
separator: ':'
target_label: __address__
- source_labels: [__address__]
target_label: instance
- job_name: promtail - job_name: promtail
static_configs: static_configs:
- targets: - targets:
- loki0.pyrocufflink.blue
- nut0.pyrocufflink.blue - nut0.pyrocufflink.blue
- nvr2.pyrocufflink.blue - nvr2.pyrocufflink.blue
- unifi3.pyrocufflink.blue - unifi3.pyrocufflink.blue
@ -456,3 +461,27 @@ scrape_configs:
- source_labels: - source_labels:
- __meta_kubernetes_pod_name - __meta_kubernetes_pod_name
target_label: instance target_label: instance
- job_name: frigate
dns_sd_configs:
- names:
- frigate.pyrocufflink.blue
type: A
port: 9100
relabel_configs:
- source_labels: [__meta_dns_name, __meta_dns_srv_record_port]
separator: ':'
target_label: __address__
- source_labels: [__address__]
target_label: instance
- job_name: haproxy
static_configs:
- targets:
- haproxy0.pyrocufflink.blue
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
target_label: __address__
replacement: '$1:8118'