kubernetes

infra

Author	SHA1	Message	Date
Dustin	2acefd9a72	v-m: Add alert for sensor battery levels I did not realize the batteries on the garage door tilt sensors had died. Adding alerts for various sensor batteries should help keep me better informed.	2024-02-16 20:56:38 -06:00
Dustin	9784b90743	cert-manager: Remove unused secrets These secrets were used by previous issuers/solvers and are no longer needed.	2024-02-16 20:56:08 -06:00
Dustin	0ad63e0613	authelia: Allow anonymous access to AlertManager Sometimes, I want to be able to look at active alerts without logging in. This rule allows read-only access to the AlertManager UI and API. Unfortunately, the user experience when attempting to create a new Silence using the UI without first logging in is suboptimal, but I think that's worth the trade-off.	2024-02-16 20:41:47 -06:00
Dustin	2f6c358860	invoice-ninja: Update PVC for restored backup The Longhorn volume for the invoice-ninja PVC got into a strange state following an unexpected shutdown this morning. One of its replicas seemed to have disappeared, and it also thought that the size had changed. As such, it got stuck in "expanding" state, but it was not actually being expanded. This issue is described in detail in the Longhorn documentation: [Troubleshooting: Unexpected expansion leads to degradation or attach failure][0]. Unfortunately, there is no way to recover a volume from that state, and it must be deleted and recreated from backup. This changes some of the properties of the PVC, so they need to be updated in the manifest. [0]: https://longhorn.io/kb/troubleshooting-unexpected-expansion-leads-to-degradation-or-attach-failure/	2024-02-15 09:45:57 -06:00
Dustin	80df160ceb	device-plugins: Allow FUSE plugin on Jenkins nodes Jenkins jobs that build container images need access to `/dev/fuse`. Thus, we have to allow Pods managed by the fuse-device-plugin DaemonSet to be scheduled on nodes that are tainted for use exclusively by Jenkins jobs.	2024-02-13 07:56:35 -06:00
Dustin	33fa951c68	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-02-03 09:52:39 -06:00
Dustin	a395d176bc	sshca: Set group principals for Server Admins Members of the Server Admins group need to be able to log in to machines using their respective privileged accounts for e.g. provisioning or emergencies.	2024-02-02 21:02:40 -06:00
Dustin	1f28a623ae	v-m: Do not scrape/alert on Graylog Graylog is down because Elasticsearch corrupted itself again, and this time, I'm just not going to bother fixing it. I practically never use it anymore anyway, and I want to migrate to Grafana Loki, so now seems like a good time to just get rid of it.	2024-02-01 21:45:43 -06:00
Dustin	380af211ec	authelia: Reduce log level	2024-02-01 21:36:27 -06:00
Dustin	94300ac502	kitchen: Use SealedSecret template for config The configuration file for the kitchen HUD server has credentials embedded in it. Until I get around to refactoring it to read these from separate locations, we'll make use of the template feature of SealedSecrets. With this feature, fields can refer to the (decrypted) value of other fields using Go template syntax. This makes it possible to have most of the `config.yaml` document unencrypted and easily modifiable, while still protecting the secrets.	2024-02-01 21:18:46 -06:00
Dustin	baab02217e	authelia: Remove rule for Paperless-ngx API I don't like the [Paperless Mobile][0] app well enough to remove the MFA restriction for the Paperless-ngx API. [0]: https://github.com/astubenbord/paperless-mobile	2024-02-01 21:17:46 -06:00
Dustin	2cd4a8b097	sshca: Configure user CA SSHCA now supports issuing user certificates. It uses OpenID Connect to authenticate requests, and issues certificates based on the user's ID token.	2024-02-01 09:02:11 -06:00
Dustin	834d0f804f	v-m: Scrape Grafana Grafana exports Prometheus metrics about its own performance.	2024-02-01 09:02:01 -06:00
Dustin	3439ce1f13	grafana: Deploy Grafana Now that Victoria Metrics is hosted in Kubernetes, it only makes sense to host Grafana there as well. I chose to use a single-instance deployment for simplicity; I don't really need high availability for Grafana. Its configuration does not change enough to worry about the downtime associated with restarting it. Migrating the existing data from SQLite to PostgreSQL, while possible, is just not worth the hassle.	2024-01-27 22:01:08 -06:00
Dustin	4e15a9d71d	invoice-ninja: Deploy Invoice Ninja Invoice Ninja is a small business management tool. Tabitha wants to use it for HLC. I am a bit concerned about the code quality of this application, and definitely alarmed at the data it send upstream, so I have tried to be extra careful with it. All privileges are revoked, including access to the Internet.	2024-01-27 21:11:26 -06:00
Dustin	a5d186b461	sshca: Add update-machine-ids script The `update-machine-ids.sh` shell script helps update the `sshca-data` SealedSecret with the current contents of the `machine-ids.json` file (stored locally, not tracked in Git).	2024-01-25 20:42:47 -06:00
Dustin	8ae8bad112	v-m: Scrape serial1.p.b	2024-01-25 20:42:07 -06:00
Dustin	7eae328a2c	sshca: Add machine ID for serial1.p.b	2024-01-25 20:41:54 -06:00
Dustin	9fff21aae1	h-a: Remove roomba_is_downstairs template sensor This sensor is now provided by a [Threshold][0] helper. [0]: https://www.home-assistant.io/integrations/threshold/	2024-01-25 17:31:36 -06:00
Dustin	8bb8ed4402	xactfetch: Additional mounts for rbw sync In order to sync the Bitwarden vault, `rbw` needs its configuration file in `/etc/rbw` and access to writable ephemeral storage at `/tmp`.	2024-01-24 12:00:13 -06:00
Dustin	ad37948fe2	v-m: Scrape all metrics components We are now getting metrics from vmstorage, vminsert, vmselect, vmalert, alertmanaer, and blackbox-exporter, in addition to vmagent.	2024-01-23 11:51:50 -06:00
Dustin	bcb588407d	v-m: Correct vmalert remote read/write URLs vmalert has been generating alerts and triggering notifications, but not writing any `ALERTS`/`ALERTS_FOR_STATE` metrics. It turns out this is because I had not correctly configured the remote read/write URLs.	2024-01-23 10:45:40 -06:00
Dustin	9a76a548ec	argocd/app: jenkins: Enable auto sync We're going to try out automatically synchronizing the Jenkins resources when changes are pushed to Git.	2024-01-22 18:50:41 -06:00
Dustin	119a8a74ae	v-m: alerts: Enhance Frigate unavailable alert If Frigate is running but not connected to the MQTT broker, the `sensor.frigate_status` entity will be available, but the `update.frigate_server` entity will not.	2024-01-22 18:27:30 -06:00
Dustin	20ef2a287b	jenkins: Update to 2.426.2	2024-01-22 18:01:03 -06:00
Dustin	fb9ac66ad3	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-01-22 17:55:53 -06:00
Dustin	0e20952740	xactfetch: Sync vault before running The Bitwarden vault needs to be synced before xactfetch runs, in case the password for a bank website has changed since it was first fetched.	2024-01-22 17:52:35 -06:00
Dustin	2f9d8ad618	jenkins: Add CA key to ssh_known_hosts Since (almost) all managed hosts have SSH certificates signed by SSHCA now, the need to maintain a pseudo-dynamic SSH key list is winding down. If we include the SSH CA key in the global known hosts file, and explicitly list the couple of hosts that do not have a certificate, we can let Ansible use that instead of fetching the host keys on each run.	2024-01-22 17:52:35 -06:00
Dustin	3d55d7aafa	keyserv: Add age key for NUT/dustin This key is used to encrypt the password for the NUT user dustin, which I use to manually control the UPS.	2024-01-22 17:52:35 -06:00
Dustin	a7450a8af2	kitchen: Fix Jenkins deployment role Since Jenkins jobs run in Kubernetes now, they can authenticate to the Kubernetes API using a ServiceAccount and do not need a dedicated User.	2024-01-22 17:00:50 -06:00
Dustin	990204b2cf	kitchen: Use Certifi TLS CA bundle for OpenSSL The MQTT client needs a trusted root CA bundle, which is not available in the container image used by the kitchen server (it's based on pythonctnr which literally only includes Python). Fortunately, as it uses OpenSSL under the hood, we can configure it to use the bundle included with the certifi Python package via an environment variable.	2024-01-22 16:57:38 -06:00
Dustin	9b441738d4	dch-webhooks: Disable HTTPS redirect The [Generic Event][0] plugin for Jenkins does not support HTTPS webhooks, only plain HTTP. [0]: https://plugins.jenkins.io/generic-event/	2024-01-22 16:55:03 -06:00
Dustin	54e7a25f93	v-m: vmstorage: Remove startup/ready probes Kubernetes will not start additional Pods in a StatefulSet until the existing ones are Ready. This means that if there is a problem bringing up, e.g. `vmstorage-0`, it will never start `vmstorage-1` or `vmstorage-2`. Since this pretty much defeats the purpose of having a multi-node `vmstorage` cluster, we have to remove the readiness probe, so the Pods will be Ready as soon as they start. If there is a problem with one of them, it will matter less, as the others can still run.	2024-01-22 16:43:46 -06:00
Dustin	ca02dfec62	v-m: Add host labels to collectd-virt metrics The virt plugin for collectd sets `instance` to the name of the libvirt domain the metric refers to. This makes it so there is no label identifying which host the VM is running on. Thus, if we want to classify metrics by VM host, we need to add that label explicitly. Since the `__address__` label is not available during metric relabeling, we need to store it in a temporary label, which gets dropped at the end of the relabeling phase. We copy the value of that label into a new label, but only for metrics that match the desired metric name.	2024-01-22 11:12:19 -06:00
Dustin	832dea2c7d	h-a: Add init container to wait for PostgreSQL When Home Assistant starts, if PostgreSQL is unavailable, it will come up successfully, but without the history component. It never tries again to connect and enable the component. This makes it difficult to detect the problem and thus easy to overlook the missing functionality. To avoid having extended periods of missing state history, we'll force Home Assistant to wait for PostgreSQL to come up before starting.	2024-01-21 19:50:54 -06:00
Dustin	50beecf0a9	h-a: Increase startup probe failure threshold Home Assistant can sometimes tke an unexpectedly long time to start up, but it eventually does.	2024-01-21 19:32:35 -06:00
Dustin	cb39b5a547	h-a: Update mobile apps notification group Updating the notification group for the family's new mobile devices.	2024-01-21 19:30:50 -06:00
Dustin	534c4bfca0	keyserv: Deploy keyserv `keyserv` is a little utility I wrote to dispense age keys to clients. It uses SSH certificates for authentication. If the client presents an SSH certificate signed by a trusted key, the server will return all the keys the principal(s) listed in the certificate are allowed to use. The response is encrypted with the public key from the certificate, so the client must have access to the corresponding private key in order to read the response. I am currently using this server to provide keys for the new configuration policy. The keys herein are used to encrypt NUT monitor passwords.	2024-01-19 22:08:25 -06:00
Dustin	897923a172	authelia: Bypass Authelia for Paperless-ngx API The [Paperless Mobile][0] app for Android uses the Paperless-ngx API. [0]: https://github.com/astubenbord/paperless-mobile/	2024-01-19 13:42:03 -06:00
Dustin	5f24ca0ad2	Merge branch 'rosalina/master'	2024-01-15 19:19:43 -06:00
Dustin	51775ede81	v-m/vmagent: Scrape nut0 nut0.pyrocufflink.blue is the new UPS monitor server. It runs Fedora CoreOS, with NUT in a container.	2024-01-15 18:46:46 -06:00
Dustin	90b293d5c8	v-m/vmagent: Scrape k8s-amd64-n3	2024-01-15 18:45:52 -06:00
Dustin	278be05121	v-m/blackbox: Switch to upstream container image I found the official container image for Prometheus Blackbox exporter. It is hosted on Quay, which is why I didn't see it on Docker Hub when I looked initially.	2024-01-15 18:45:25 -06:00
Dustin	539e25d9bd	v-m/vmagent: Scrape public clouds to test Internet Scraping the public DNS servers doesn't work anymore since the firewall routes traffic through Mullvad. Pinging public cloud providers should give a pretty decent indication of Internet connectivity. It will also serve as a benchmark for the local DNS performance, since the names will have to be resolved.	2024-01-15 18:44:46 -06:00
Dustin	6496e76079	autoscaler: Update to CA 1.26 Cluster Autoscaler version is supposed to match the Kubernetes version. Also, updating specifically to address ASG tags for node resources ([issue 5164]). [issue 5164]: https://github.com/kubernetes/autoscaler/issues/5164	2024-01-14 11:33:30 -06:00
Dustin	89516ebf55	sshca: Add machine ID for nut0	2024-01-13 09:51:13 -06:00
Dustin	4cec66fc13	sshca: Add machine IDs for nvr1, k8s-aarch64-n1	2024-01-07 21:16:37 -06:00
Dustin	fbf2a6864f	cert-manager: cert-exporter: Static SSH host keys The cert-exporter script really only needs the SSH host key for Gitea, so the dynamic host key fetch is overkill. Since it frequently breaks for various reasons, it's probably better to just have a static list of trusted keys.	2024-01-04 15:35:00 -06:00
Dustin	98cdcdfe30	v-m/scrape: Stable instance label for Longhorn By default, the `instance` label for discovered metrics targets is set to the scrape address. For Kubernetes pods, that is the IP address and port of the pod, which naturally changes every time the pod is recreated or moved. This will cause a high churn rate for Longhorn manager pods. To avoid this, we set the `instance` label to the name of the node the pod is running on, which will not change because the Longhorn manager pods are managed by a DaemonSet.	2024-01-04 09:16:20 -06:00
Dustin	ce3bc87f9e	authelia: Reduce concent durations After considering the implications of Authelia's pre-configured consent feature, I decided I did not like the fact that a malicious program could potentially take over my entire Kubernetes cluster without my knowledge, since `kubectl` may not require any interaction, and could therefore be executed without my knowledge. I stopped ticking the "Remember Consent" checkbox out of paranoia, but that's gotten kind of annoying. I figure a good compromise is to only prompt for consent a couple of times per day.	2024-01-04 09:08:07 -06:00

1 2 3 4 5 ...

253 Commits (21e8ad2afd04b869d00150b8c577b27a09733b59) All Branches Search

253 Commits (21e8ad2afd04b869d00150b8c577b27a09733b59)

All Branches