v-m: Deploy (clustered) Victoria Metrics

Since *mtrcs0.pyrocufflink.blue* (the Metrics Pi) seems to be dying,
I decided to move monitoring and alerting into Kubernetes.

I was originally planning to have a single, dedicated virtual machine
for Victoria Metrics and Grafana, similar to how the Metrics Pi was set
up, but running Fedora CoreOS instead of a custom Buildroot-based OS.
While I was working on the Ignition configuration for the VM, it
occurred to me that monitoring would be interrupted frequently, since
FCOS updates weekly and all updates require a reboot.  I would rather
not have that many gaps in the data.  Ultimately I decided that
deploying a cluster with Kubernetes would probably be more robust and
reliable, as updates can be performed without any downtime at all.

I chose not to use the Victoria Metrics Operator, but rather handle
the resource definitions myself.  Victoria Metrics components are not
particularly difficult to deploy, so the overhead of running the
operator and using its custom resources would not be worth the minor
convenience it provides.
This commit is contained in:
2024-01-01 15:23:14 -06:00
parent 8c605d0f9f
commit 8f088fb6ae
17 changed files with 1474 additions and 0 deletions

View File

@@ -0,0 +1,191 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: victoria-metrics
labels:
- pairs:
app.kubernetes.io/instance: victoria-metrics
includeSelectors: true
- pairs:
app.kubernetes.io/part-of: victoria-metrics
includeSelectors: false
resources:
- namespace.yaml
- secrets.yaml
- vmstorage.yaml
- vmselect.yaml
- vminsert.yaml
- vmagent.yaml
- vmalert.yaml
- alertmanager.yaml
- blackbox-exporter.yaml
- ingress.yaml
configMapGenerator:
- name: vmagent
files:
- scrape.yml
options:
disableNameSuffixHash: true
- name: vmalert-rules
files:
- alerts.yml
options:
disableNameSuffixHash: true
- name: alertmanager
files:
- alertmanager.yml=alertmanager.config.yml
options:
disableNameSuffixHash: true
- name: blackbox
files:
- blackbox.yml
options:
disableNameSuffixHash: true
replicas:
# When changing the number of vmstorage replicas, be sure to update
# the storageNode value for vmselect and vminsert. Also, the
# replicationFactor setting may need adjusted.
- name: vmstorage
count: 3
- name: vmselect
count: 2
- name: vminsert
count: 2
- name: vmagent
count: 2
- name: vmalert
count: 2
# When changing the number of alertmanager replicas, be sure to update
# the notifier URL value for vmalert and the peer addresses provided to
# Alertmanager itself.
- name: alertmanager
count: 2
patches:
- patch: |
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vmstorage
spec:
template:
spec:
containers:
- name: vmstorage
env:
- name: vmstorage_dedup_minScrapeInterval
value: 1m
- name: vmstorage_retentionPeriod
value: 5y
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: vmselect
spec:
template:
spec:
containers:
- name: vmselect
env:
- name: vmselect_storageNode
value: vmstorage-0.vmstorage,vmstorage-1.vmstorage,vmstorage-2.vmstorage
- name: vmselect_replicationFactor
value: '2'
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: vminsert
spec:
template:
spec:
containers:
- name: vminsert
env:
- name: vminsert_storageNode
value: vmstorage-0.vmstorage,vmstorage-1.vmstorage,vmstorage-2.vmstorage
- name: vminsert_dedup_minScrapeInterval
value: 1m
- name: vminsert_replicationFactor
value: '2'
- patch: |
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vmagent
spec:
template:
spec:
containers:
- name: vmagent
env:
- name: SCRAPE_GRAYLOG_TOKEN
valueFrom:
secretKeyRef:
name: vmagent
key: graylog.token
optional: true
volumeMounts:
- mountPath: /run/secrets/vmagent
name: secrets
readOnly: true
- mountPath: /scrape/collectd
name: scrape-collectd
readOnly: true
volumes:
- name: scrape-collectd
configMap:
name: scrape-collectd
optional: true
- name: secrets
secret:
secretName: vmagent
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: vmalert
spec:
template:
spec:
containers:
- name: vmalert
env:
- name: vmalert_http_pathPrefix
value: /vmalert
- name: vmalert_notifier_url
value: http://alertmanager-0.alertmanager:9093,http://alertmanager-1.alertmanager:9093
startupProbe:
httpGet:
path: /vmalert/health
readinessProbe:
httpGet:
path: /vmalert/health
- patch: |
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: alertmanager
spec:
template:
spec:
containers:
- name: alertmanager
args:
- --config.file=/etc/alertmanager/alertmanager.yml
- --storage.path=/alertmanager
- --cluster.peer=alertmanager-0.alertmanager:9094
- --cluster.peer=alertmanager-1.alertmanager:9094