kubernetes

Author	SHA1	Message	Date
Dustin C. Hatch	4243823ba5	invoice-ninja: Fix network policy for ingress Since the IP address assigned to the ingress controller is now managed by keepalived and known to Kubernetes, the network policy needs to allow access to it by pod namespace rather than IP address. It seems that the former takes precedence over the latter, so even though the IP address was explicitly allowed, traffic was not permitted because it was destined for a Kubernetes service that was not.	2024-12-07 09:28:44 -06:00
Dustin C. Hatch	b269fa5812	home-assistant: Add service to shut down desk panel Home Assistant can now SSH into the desk panel and shut it down.	2024-12-02 23:06:30 +00:00
Dustin C. Hatch	107852ad54	home-assistant: Eable auto-login for desk panel Home Assistant supports unauthenticated access for certain clients using its _trusted_network_ auth provider. With this configuration, we allow the desk panel to automatically sign in as the _kiosk_ user, but all other clients must authenticate normally.	2024-11-27 22:03:40 -06:00
Dustin C. Hatch	72d3f222c5	jenkins: Trust SSHCA for pyrocufflink.black The new machines have names in the _pyrocufflink.black_ zone. We need to trust the SSHCA certificate to sign keys for these names in order to connect to them and manage them with Ansible.	2024-11-26 03:35:21 +00:00
Dustin C. Hatch	2a90ffc7a9	invoice-ninja: Update trusted proxies addresses Since _ingress-nginx_ no longer runs in the host network namespace, traffic will appear to come from pods' internal IP addresses now. Similarly, the network policy for Invoice Ninja needs to be updated to allow traffic _to_ the ingress controllers' new addresses.	2024-11-22 22:43:16 -06:00
Dustin C. Hatch	1f7631d6b7	home-assistant: Update trusted proxies addresses Since _ingress-nginx_ no longer runs in the host network namespace, traffic will appear to come from pods' internal IP addresses now.	2024-11-22 22:42:43 -06:00
Dustin C. Hatch	607fa050f3	firefly-iii: Update trusted proxies addresses Since _ingress-nginx_ no longer runs in the host network namespace, traffic will appear to come from pods' internal IP addresses now.	2024-11-22 22:41:49 -06:00
Dustin C. Hatch	0a5af84778	rabbitmq: Configure Service externalIPs Clients outside the cluster can now communicate with RabbitMQ directly on port 5671 by using its dedicated external IP address. This address is automatically assigned to the node where RabbitMQ is running by `keepalived`.	2024-11-22 22:39:30 -06:00
Dustin C. Hatch	1a39a8869a	h-a/mosquitto: Configure Service externalIPs Clients outside the cluster can now communicate with Mosquitto directly on port 8883 by using its dedicated external IP address. This address is automatically assigned to the node where Mosquitto is running by `keepalived`.	2024-11-22 22:37:01 -06:00
Dustin C. Hatch	fefbaa9991	ingress: Use Deployment+Service with externalIPs Now that we have `keepalived` managing the "virtual" IP address for the ingress controller, we can change _ingress-nginx_ to run as a Deployment rather than a DaemonSet. It no longer needs to use the host network namespace, as `kube-proxy` will route all traffic sent to the configured external IP address to the controller pods. Using the _Local_ external traffic policy disables NAT, so incoming traffic is seen by the nginx unmodified.	2024-11-22 22:35:37 -06:00
Dustin C. Hatch	e7ea2b0659	keepalived: Initial commit Running `keepalived` as a DaemonSet will allow managing floating "virtual" IP addresses for Kubernetes services with configured external IP addresses. The main services we want to expose outside the cluster are _ingress-nginx_, Mosquitto, and RabbitMQ. The `keepalived` cluster will negotiate using the VRRF protocol to determine which node should have each external address. Using the process tracking feature of `keepalived`, we can steer traffic directly to the node where the target service is running.	2024-11-22 22:26:48 -06:00
Dustin C. Hatch	5c78bb89b5	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-11-22 19:38:00 -06:00
Dustin C. Hatch	0a6086eb2a	longhorn: Run on dedicated nodes I've created new worker nodes that are dedicated to running Longhorn replicas. These nodes are tainted with the `node-role.kubernetes.io/longhorn` taint, so no regular pods will be scheduled there by default. Longhorn pods thus needs to be configured to tolerate that taint, and to be scheduled on nodes with the similarly-named label.	2024-11-21 22:59:14 -06:00
Dustin C. Hatch	d6c83565ec	rabbitmq: Update to 4.0 RabbitMQ Server 3.13 is out of support now.	2024-11-21 22:59:14 -06:00
Dustin C. Hatch	121e6e7111	rabbitmq: Switch to using volume claim templates This will make it easier to "blow away" the RabbitMQ data volume on the occasions when it gets into a weird state. Simply scale the StatefulSet down to 0 replicas, delete the PVC, then scale back up. Kubernetes will handle creating a new PVC automatically.	2024-11-21 22:59:14 -06:00
Dustin C. Hatch	3d5dd52eb9	ingress: Use upstream resources w/ patches This will make it easier to upgrade, since we keep track of _exactly_ what we changed from the upstream resources with Kustomize patches.	2024-11-21 19:42:35 -06:00
Dustin C. Hatch	3b3d4c38ed	dynk8s: Move Wireguard config to SealedSecret	2024-11-21 19:41:55 -06:00
Dustin C. Hatch	da81a336e1	dynk8s-provisioner: Migrate to Kustomize	2024-11-19 10:43:42 -06:00
Dustin C. Hatch	e0c633c21e	v-m: scrape: Fix Nextcloud URL Nextcloud uses a _client-side_ (Javascript) redirect to navigate the browser to its `index.php`. The page it serves with this redirect is static and will often load successfully, even if there is a problem with the application. This causes the Blackbox exporter to record the site as "up," even when it it definitely is not. To avoid this, we can scrape the `index.php` page explicitly, ensuring that the application is loaded.	2024-11-17 18:43:00 +00:00
Dustin	14492d827a	Merge pull request 'home-assistant: Update to 2024.11.2' (#34 ) from updatebot/home-assistant into master Reviewed-on: #34	2024-11-16 18:04:43 +00:00
Dustin	444686cb1e	Merge pull request 'paperless-ngx: Update to 2.13.0' (#31 ) from updatebot/paperless-ngx into master Reviewed-on: #31	2024-11-16 17:55:04 +00:00
Dustin	ceea84d7f9	Merge pull request 'firefly-iii: Update to 6.1.22' (#33 ) from updatebot/firefly-iii into master Reviewed-on: #33	2024-11-16 17:45:08 +00:00
bot	4d2cc40b5e	tika: Update to 3.0.0.0	2024-11-16 12:32:14 +00:00
bot	c31db5fde2	gotenberg: Update to 8.13.0	2024-11-16 12:32:14 +00:00
bot	74ce0e1b0a	paperless-ngx: Update to 2.13.5	2024-11-16 12:32:14 +00:00
bot	f0b16fd53c	firefly-iii: Update to 6.1.22	2024-11-16 12:32:12 +00:00
bot	acd9a0fa92	zwavejs2mqtt: Update to 9.27.2	2024-11-16 12:32:08 +00:00
bot	115b4ade39	home-assistant: Update to 2024.11.2	2024-11-16 12:32:08 +00:00
Dustin	c1927eecfc	Merge pull request 'home-assistant: Update to 2024.10.4' (#30 ) from updatebot/home-assistant into master Reviewed-on: #30	2024-11-12 15:56:50 +00:00
Dustin	04ef1faf75	Merge pull request 'authelia: Update to 4.38.17' (#32 ) from updatebot/authelia into master Reviewed-on: #32	2024-11-12 15:14:50 +00:00
Dustin C. Hatch	0209f921c3	v-m: Remove nut0 from scrape targets _nut0.pyrocufflink.blue_ is decommissioned.	2024-11-12 08:02:00 -06:00
Dustin C. Hatch	62b19e942b	sshca: Add machine ID for nut1.p.b	2024-11-10 11:19:53 -06:00
bot	b956e9ac05	authelia: Update to 4.38.17	2024-11-09 12:32:16 +00:00
bot	f7eb3b49e7	zwavejs2mqtt: Update to 9.26.0	2024-11-09 12:32:08 +00:00
bot	0db830a670	zigbee2mqtt: Update to 1.41.0	2024-11-09 12:32:08 +00:00
bot	6d137af6dc	home-assistant: Update to 2024.11.1	2024-11-09 12:32:08 +00:00
Dustin C. Hatch	3d40424cf7	fleetlock: Use patched server from Github PR The _fleetlock_ server drains all pods from a node before allocating the reboot lock to that node. Unfortunately, it doesn't actually wait for those pods to be completely evicted. If some pods take too long to shut down, they may get stuck in `Terminating` state once the machine starts rebooting. This makes it so those pods cannot be replaced on another node with the original one is offline, which pretty much defeats the purpose of using Fleetlock in the first place. It seems upstream has abandoned this project, as there is an open [Pull Request][0] to fix this issue that has so far been ignored. Fortunately, building a new container image containing the patch is easy enough, so we can run our own patched build. [0]: https://github.com/poseidon/fleetlock/pull/271	2024-11-05 07:05:55 -06:00
Dustin C. Hatch	ac62a77c96	Merge branch '20125'	2024-11-05 07:05:19 -06:00
Dustin C. Hatch	e1d9833e83	cert-manager: Add cert for apps.du5t1n.xyz	2024-11-05 07:04:27 -06:00
Dustin C. Hatch	4ad5518f18	cert-manager: Migrate config to configMapGenerator	2024-11-05 07:04:09 -06:00
Dustin C. Hatch	9f287d0f71	v-m/alerts: Add alerts for backup RAID array Just like I did with the RAID-1 array in the old BURP server, I will keep one member active and one in the fireproof safe, swapping them each month. We can use the same metrics queries to alert on when the swap should happen that we used with the BURP server.	2024-11-04 20:46:03 -06:00
Dustin C. Hatch	2380468658	v-m/scrape: Collect Jellyfin metrics	2024-11-04 20:38:25 -06:00
Dustin C. Hatch	db7c07ee55	v-m/scrape: Ignore cloud Kubernetes nodes The ephemeral Jenkins worker nodes that run in AWS don't have colletcd, promtail, or Zincati. We don't needto get three alerts every time a worker starts up to handle am ARM build job, so we drop these discovered targets for these scrape jobs.	2024-11-04 20:35:17 -06:00
Dustin C. Hatch	d76a1360c8	v-m/alerts: Ignore Paperless consume_file task Paperless-ngx uses a Celery task to process uploaded files, converting them to PDF, running OCR, etc. This task can be marked as "failed" for various reasons, most of which are more about the document itself than the health of the application. The GUI displays the results of failed tasks when they occur. It doesn't really make sense to have an alert about this scenario, especially since there's nothing to do to directly clear the alert anyway.	2024-11-04 20:28:11 -06:00
Dustin C. Hatch	71b52e4c6f	20125: Deploy Status server https://20125.home/ is the URL the Status Android application loads in its main WebView. This site is powered by a server that generates a custom page showing the status of our self-hosted applications, based on alerts retrieved from the AlertManager API. Android WebView does not allow cleartext HTTP connections. It does, however, allow connecting an HTTPS server and ignoring the certificate it presents, which is effectively the same thing. Thus, we generate a self-signed certificate for the Ingress for this site.	2024-11-02 19:51:53 -05:00
Dustin C. Hatch	8ecee4133f	v-m/alerts: Rework free disk space alert Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal circumstances on aarch64 machines. This is not a problem, because it cleans up old files on its own, so we do not need to alert on it. Unfortunately, the _DiskUsage_ alert is already quite complex, and adding in exclusions for these devices would make it even worse. To simplify the logic, we can use a recording rule to precomupte the used/free space ratio. By using `sum(...) without (type)` instead of `sum(...) on (df, instance)`, we keep the other labels, which we can then use to identify the metrics coming from machines we don't care to monitor. Instead of having different thresholds for different volumes encoded in the same expression, we can use multiple alerts to alert on "low" vs "very low" thresholds. Since this will of course cause duplicate alerts for most volumes, we can use AlertManager inhibition rules to disable the "low" alert once the metric crosses the "very low" threshold.	2024-11-02 09:38:02 -05:00
Dustin C. Hatch	4cef41688f	v-m/alerts: Add Zigbee+ZWave network alerts	2024-11-01 18:14:56 -05:00
Dustin C. Hatch	6cf11f9f61	v-m: Scrape HAProxy	2024-11-01 18:14:37 -05:00
Dustin C. Hatch	7a768cbb76	v-m: Update jobs for new Loki server loki1.pyrocufflink.blue is a regular Fedora machine, a member of the AD domain, and managed by Ansible. Thus, it does not need to be explicitly listed as a scrape target. For scraping metrics from Loki itself, I've changed the job to use DNS-SD because it seems like `vmagent` does _not_ re-resolve host names from static configuration.	2024-11-01 18:07:34 -05:00
Dustin C. Hatch	0101040634	v-m/alerts: Add Paperless-ngx email task alert This alert should fire if the background task to fetch e-mail and import them into Paperless-ngx has not run for a while.	2024-11-01 18:04:06 -05:00

1 2 3 4 5 ...

491 Commits