Commit Graph

65 Commits

Author SHA1 Message Date
834d0f804f v-m: Scrape Grafana
Grafana exports Prometheus metrics about its own performance.
2024-02-01 09:02:01 -06:00
8ae8bad112 v-m: Scrape serial1.p.b 2024-01-25 20:42:07 -06:00
ad37948fe2 v-m: Scrape all metrics components
We are now getting metrics from *vmstorage*, *vminsert*, *vmselect*,
*vmalert*, *alertmanaer*, and *blackbox-exporter*, in addition to
*vmagent*.
2024-01-23 11:51:50 -06:00
bcb588407d v-m: Correct vmalert remote read/write URLs
*vmalert* has been generating alerts and triggering notifications, but
not writing any `ALERTS`/`ALERTS_FOR_STATE` metrics.  It turns out this
is because I had not correctly configured the remote read/write
URLs.
2024-01-23 10:45:40 -06:00
119a8a74ae v-m: alerts: Enhance Frigate unavailable alert
If Frigate is running but not connected to the MQTT broker, the
`sensor.frigate_status` entity will be available, but the
`update.frigate_server` entity will not.
2024-01-22 18:27:30 -06:00
54e7a25f93 v-m: vmstorage: Remove startup/ready probes
Kubernetes will not start additional Pods in a StatefulSet until the
existing ones are Ready.  This means that if there is a problem bringing
up, e.g. `vmstorage-0`, it will never start `vmstorage-1` or
`vmstorage-2`.  Since this pretty much defeats the purpose of having a
multi-node `vmstorage` cluster, we have to remove the readiness probe,
so the Pods will be Ready as soon as they start.  If there is a problem
with one of them, it will matter less, as the others can still run.
2024-01-22 16:43:46 -06:00
ca02dfec62 v-m: Add host labels to collectd-virt metrics
The *virt* plugin for *collectd* sets `instance` to the name of the
libvirt domain the metric refers to.  This makes it so there is no label
identifying which host the VM is running on.  Thus, if we want to
classify metrics by VM host, we need to add that label explicitly.

Since the `__address__` label is not available during metric relabeling,
we need to store it in a temporary label, which gets dropped at the end
of the relabeling phase.  We copy the value of that label into a new
label, but only for metrics that match the desired metric name.
2024-01-22 11:12:19 -06:00
51775ede81 v-m/vmagent: Scrape nut0
*nut0.pyrocufflink.blue* is the new UPS monitor server.  It runs Fedora
CoreOS, with NUT in a container.
2024-01-15 18:46:46 -06:00
90b293d5c8 v-m/vmagent: Scrape k8s-amd64-n3 2024-01-15 18:45:52 -06:00
278be05121 v-m/blackbox: Switch to upstream container image
I found the official container image for Prometheus Blackbox exporter.
It is hosted on Quay, which is why I didn't see it on Docker Hub when I
looked initially.
2024-01-15 18:45:25 -06:00
539e25d9bd v-m/vmagent: Scrape public clouds to test Internet
Scraping the public DNS servers doesn't work anymore since the firewall
routes traffic through Mullvad.  Pinging public cloud providers should
give a pretty decent indication of Internet connectivity.  It will also
serve as a benchmark for the local DNS performance, since the names will
have to be resolved.
2024-01-15 18:44:46 -06:00
98cdcdfe30 v-m/scrape: Stable instance label for Longhorn
By default, the `instance` label for discovered metrics targets is set
to the scrape address.  For Kubernetes pods, that is the IP address and
port of the pod, which naturally changes every time the pod is recreated
or moved.  This will cause a high churn rate for Longhorn manager pods.
To avoid this, we set the `instance` label to the name of the node the
pod is running on, which will not change because the Longhorn manager
pods are managed by a DaemonSet.
2024-01-04 09:16:20 -06:00
bac7de72f2 v-m: Scrape Longhorn manager metrics
Each Longhorn manager pod exports metrics about the node on which it is
running.  Thus, we have to scrape every pod to get the metrics about the
whole ecosystem.
2024-01-02 11:27:31 -06:00
225fd8469c v-m/vmagent: Allow listing all pods in cluster
The original RBAC configuration allowed `vmagent` only to list the pods
in the `victoria-metrics` namespace.  In order to allow it to monitor
other applications' pods, it needs to be assigned permission to list
pods in all namespaces.
2024-01-02 11:25:54 -06:00
8f088fb6ae v-m: Deploy (clustered) Victoria Metrics
Since *mtrcs0.pyrocufflink.blue* (the Metrics Pi) seems to be dying,
I decided to move monitoring and alerting into Kubernetes.

I was originally planning to have a single, dedicated virtual machine
for Victoria Metrics and Grafana, similar to how the Metrics Pi was set
up, but running Fedora CoreOS instead of a custom Buildroot-based OS.
While I was working on the Ignition configuration for the VM, it
occurred to me that monitoring would be interrupted frequently, since
FCOS updates weekly and all updates require a reboot.  I would rather
not have that many gaps in the data.  Ultimately I decided that
deploying a cluster with Kubernetes would probably be more robust and
reliable, as updates can be performed without any downtime at all.

I chose not to use the Victoria Metrics Operator, but rather handle
the resource definitions myself.  Victoria Metrics components are not
particularly difficult to deploy, so the overhead of running the
operator and using its custom resources would not be worth the minor
convenience it provides.
2024-01-01 17:48:10 -06:00