kubernetes

Author	SHA1	Message	Date
Dustin C. Hatch	ac62a77c96	Merge branch '20125'	2024-11-05 07:05:19 -06:00
Dustin C. Hatch	e1d9833e83	cert-manager: Add cert for apps.du5t1n.xyz	2024-11-05 07:04:27 -06:00
Dustin C. Hatch	4ad5518f18	cert-manager: Migrate config to configMapGenerator	2024-11-05 07:04:09 -06:00
Dustin C. Hatch	9f287d0f71	v-m/alerts: Add alerts for backup RAID array Just like I did with the RAID-1 array in the old BURP server, I will keep one member active and one in the fireproof safe, swapping them each month. We can use the same metrics queries to alert on when the swap should happen that we used with the BURP server.	2024-11-04 20:46:03 -06:00
Dustin C. Hatch	2380468658	v-m/scrape: Collect Jellyfin metrics	2024-11-04 20:38:25 -06:00
Dustin C. Hatch	db7c07ee55	v-m/scrape: Ignore cloud Kubernetes nodes The ephemeral Jenkins worker nodes that run in AWS don't have colletcd, promtail, or Zincati. We don't needto get three alerts every time a worker starts up to handle am ARM build job, so we drop these discovered targets for these scrape jobs.	2024-11-04 20:35:17 -06:00
Dustin C. Hatch	d76a1360c8	v-m/alerts: Ignore Paperless consume_file task Paperless-ngx uses a Celery task to process uploaded files, converting them to PDF, running OCR, etc. This task can be marked as "failed" for various reasons, most of which are more about the document itself than the health of the application. The GUI displays the results of failed tasks when they occur. It doesn't really make sense to have an alert about this scenario, especially since there's nothing to do to directly clear the alert anyway.	2024-11-04 20:28:11 -06:00
Dustin C. Hatch	71b52e4c6f	20125: Deploy Status server https://20125.home/ is the URL the Status Android application loads in its main WebView. This site is powered by a server that generates a custom page showing the status of our self-hosted applications, based on alerts retrieved from the AlertManager API. Android WebView does not allow cleartext HTTP connections. It does, however, allow connecting an HTTPS server and ignoring the certificate it presents, which is effectively the same thing. Thus, we generate a self-signed certificate for the Ingress for this site.	2024-11-02 19:51:53 -05:00
Dustin C. Hatch	8ecee4133f	v-m/alerts: Rework free disk space alert Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal circumstances on aarch64 machines. This is not a problem, because it cleans up old files on its own, so we do not need to alert on it. Unfortunately, the _DiskUsage_ alert is already quite complex, and adding in exclusions for these devices would make it even worse. To simplify the logic, we can use a recording rule to precomupte the used/free space ratio. By using `sum(...) without (type)` instead of `sum(...) on (df, instance)`, we keep the other labels, which we can then use to identify the metrics coming from machines we don't care to monitor. Instead of having different thresholds for different volumes encoded in the same expression, we can use multiple alerts to alert on "low" vs "very low" thresholds. Since this will of course cause duplicate alerts for most volumes, we can use AlertManager inhibition rules to disable the "low" alert once the metric crosses the "very low" threshold.	2024-11-02 09:38:02 -05:00
Dustin C. Hatch	4cef41688f	v-m/alerts: Add Zigbee+ZWave network alerts	2024-11-01 18:14:56 -05:00
Dustin C. Hatch	6cf11f9f61	v-m: Scrape HAProxy	2024-11-01 18:14:37 -05:00
Dustin C. Hatch	7a768cbb76	v-m: Update jobs for new Loki server loki1.pyrocufflink.blue is a regular Fedora machine, a member of the AD domain, and managed by Ansible. Thus, it does not need to be explicitly listed as a scrape target. For scraping metrics from Loki itself, I've changed the job to use DNS-SD because it seems like `vmagent` does _not_ re-resolve host names from static configuration.	2024-11-01 18:07:34 -05:00
Dustin C. Hatch	0101040634	v-m/alerts: Add Paperless-ngx email task alert This alert should fire if the background task to fetch e-mail and import them into Paperless-ngx has not run for a while.	2024-11-01 18:04:06 -05:00
Dustin C. Hatch	3f9601dc94	v-m/alerts: Improve Paperless-ngx Celery task alert The `flower_events_total` metric is a counter, so its value only ever increases (discounting restarts of the server process). As such, nonzero values do not necessarily indicate a _current_ problem, but rather that there was one at some point in the past. To identify current issues, we need to use the `increase` function, and then apply the `max_over_time` function so that the alert doesn't immediately reset itself.	2024-11-01 18:00:50 -05:00
Dustin C. Hatch	d12e66f58a	v-m: Scrape Frigate exporter	2024-11-01 17:47:51 -05:00
Dustin C. Hatch	045eea89a9	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-19 09:49:59 -05:00
Dustin C. Hatch	8ff45a8c01	paperless-ngx/gotenberg: Run as correct user The Gotenberg container image uses UID 1001 for the _gotenberg_ user. Using any other UID number, even when the home directory is set and owned by that UID, results in random issues, especially when using LibreOffice conversions.	2024-10-19 09:46:15 -05:00
giteadmin	d3e00680c0	Merge pull request 'home-assistant: Update to 2024.10.3' (#29 ) from updatebot/home-assistant into master Reviewed-on: #29	2024-10-19 13:13:12 +00:00
bot	c5daf23f71	mosquitto: Update to 2.0.20	2024-10-19 11:32:16 +00:00
bot	6e2c8d1a25	zwavejs2mqtt: Update to 9.24.0	2024-10-19 11:32:16 +00:00
bot	0e3f719e32	whisper: Update to 2.2.0	2024-10-19 11:32:16 +00:00
bot	94e10207d2	home-assistant: Update to 2024.10.3	2024-10-19 11:32:15 +00:00
Dustin C. Hatch	99c8f7694c	paperless-ngx: Split resources into separate files The Paperless-ngx ecosystem consists of several services. Defining the resources for each service in separate manifest files will make maintenance a little bit easier.	2024-10-17 07:27:33 -05:00
Dustin C. Hatch	e19e8f50ab	v-m/alerts: Add alerts for Paperless-ngx	2024-10-17 07:18:23 -05:00
Dustin C. Hatch	78651eb5f8	v-m/alerts: Add alerts for PostgreSQL WAL archiver	2024-10-17 07:18:09 -05:00
Dustin C. Hatch	ee3e078b20	v-m/alerts: Add alerts for Restic backups	2024-10-17 06:58:48 -05:00
Dustin C. Hatch	ea89e0cde4	v-m/scrape: Remove synapse job The Synapse server is now completely decommissioned.	2024-10-17 06:50:27 -05:00
Dustin C. Hatch	e581957f9d	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-15 07:59:42 -05:00
Dustin	b01300f8cc	Merge pull request 'zwavejs2mqtt: Update to 9.20.0' (#26 ) from updatebot/home-assistant into master Reviewed-on: #26	2024-10-15 12:43:28 +00:00
bot	55ae979a1d	mosquitto: Update to 2.0.19	2024-10-15 12:42:36 +00:00
bot	1de05f2ccc	zwavejs2mqtt: Update to 9.23.0	2024-10-15 12:42:36 +00:00
bot	58f7f9e2cc	zigbee2mqtt: Update to 1.40.2	2024-10-15 12:42:35 +00:00
bot	390eacf209	home-assistant: Update to 2024.10.2	2024-10-15 12:42:35 +00:00
Dustin C. Hatch	145fa6286e	storage: Add Longhorn backup target secret Longhorn uses a special Secret resource to configure the backup target. This secret includes the credentials and CA certificate for accessing the MinIO S3 service. Longhorn must be configured to use this Secret by setting the `backup-target-credential-secret` setting to `minio-backups-credentials`.	2024-10-13 14:03:49 -05:00
Dustin	1b4bb234c8	Merge pull request 'gotenberg: Update to 8.10.0' (#25 ) from updatebot/paperless-ngx into master Reviewed-on: #25	2024-10-12 20:44:58 +00:00
Dustin	7e2512c261	Merge pull request 'authelia: Update to 4.38.12' (#28 ) from updatebot/authelia into master Reviewed-on: #28	2024-10-12 20:44:41 +00:00
bot	281ec623c4	authelia: Update to 4.38.16	2024-10-12 11:33:03 +00:00
bot	51fe6f39af	gotenberg: Update to 8.12.0	2024-10-12 11:33:00 +00:00
Dustin C. Hatch	2ccbcd494c	firefly-iii: Update to 6.1.21 Notably, this version fixes the ~4s delay when creating/editing transactions.	2024-10-02 09:08:58 -05:00
Dustin C. Hatch	e9bfc63a74	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-02 09:08:31 -05:00
Dustin	32171cc76e	Merge pull request 'firefly-iii: Update to 6.1.20' (#27 ) from updatebot/firefly-iii into master Reviewed-on: #27	2024-09-29 21:09:41 +00:00
bot	71f091fa05	firefly-iii: Update to 6.1.20	2024-09-28 11:32:18 +00:00
Dustin C. Hatch	df50decba1	argocd: apps/authelia: Enable auto-sync This way, merging PRs from updatebot will automatically trigger updating Paperless-ngx et al.	2024-09-24 07:16:45 -05:00
Dustin C. Hatch	0022171616	argocd: apps/ntfy: Enable auto-sync This way, merging PRs from updatebot will automatically trigger updating Paperless-ngx et al.	2024-09-24 07:16:34 -05:00
Dustin C. Hatch	a149bc8761	updatebot: Manage Authelia	2024-09-24 07:15:41 -05:00
Dustin C. Hatch	76588c3e20	updatebot: Manage Mosquitto	2024-09-24 07:08:56 -05:00
Dustin C. Hatch	bdc24e1778	updatebot: Manage ntfy	2024-09-24 07:05:37 -05:00
Dustin C. Hatch	982cd88255	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-09-22 12:13:58 -05:00
Dustin C. Hatch	ffa47b9fba	v-m: Scrape ntfy _ntfy_ has supported Prometheus metrics for a while now, so let's collect them.	2024-09-22 12:13:01 -05:00
Dustin C. Hatch	9ec6b651c1	v-m: Scrape wal-g via statsd_exporter The database server now runs _statsd_exporter_, which receives metrics from WAL-G whenever it saves WAL segments or creates backups.	2024-09-22 12:11:59 -05:00

1 2 3 4 5 ...

404 Commits