kubernetes

infra

Author	SHA1	Message	Date
Dustin	f468977d91	grafana: Enable send_user_header option I discovered today that if anonymous Grafana users have Viewer permission, they can use the Datasource API to make arbitrary queries to any backend, even if they cannot access the Explore page directly. This is documented ([issue #48313][0]) as expected behavior. I don't really mind giving anonymous access to the Victoria Metrics datasource, but I definitely don't want anonymous users to be able to make Loki queries and view log data. Since Grafana Datasource Permissions is limited to Grafana Enterprise and not available in the open source version of Grafana, the official recommendation from upstream is to use a separate Organization for the Loki datasource. Unfortunately, this would preclude having dashboards that have graphs from both data sources. Although I don't have any of those right now, I like the idea and may build some eventually. Fortunately, I discovered the `send_user_header` Grafana configuration option. With this enabled, Grafana will send an `X-Grafana-User` header with the username of the user on whose behalf it is making a request to the backend. If the user is not logged in, it does not send the header. Thus, we can detect the presence of this header on the backend and refuse to serve query requests if it is missing. [0]: https://github.com/grafana/grafana/issues/48313	2024-02-22 07:10:01 -06:00
Dustin	35ff500812	grafana: Configure Loki datastore Usually, Grafana datastores are configured using its web GUI. When setting up a datastore that requires TLS client authentication, the client certificate and private key have to be pasted into the form. For certificates that renew frequently, this method would require a frequent manual effort. Fortunately, Grafana supports defining datastores via its "provisioning" mechanism, reading the configuration from YAML files on the filesystem.	2024-02-22 07:10:01 -06:00
Dustin	d4efb735bf	loki-ca: Add cert-manager issuer for Loki CA The Loki CA is used to issue client certificates for Grafana Loki. This _cert-manager_ ClusterIssuer will allow applications running in Kubernetes (e.g. Grafana) to request a Certificate that they can use to access the Loki HTTP API.	2024-02-22 07:10:01 -06:00
Dustin	d08cc6fb0f	step-ca: Redeploy with DCH CA R3 I never ended up using _Step CA_ for anything, since I was initially focused on the SSH CA feature and I was unhappy with how it worked (which led me to write _SSHCA_). I didn't think about it much until I was working on deploying Grafana Loki. For that project, I wanted to use a certificate signed by a private CA instead of the wildcard certificate for _pyrocufflink.blue_. So, I created DCH CA R3 for that purpose. Then, for some reason, I used the exact same procedure to fetch the certificate from Kubernetes as I had set up for the _pyrocufflink.blue_ wildcard certificate, as used by Frigate. This of course defeated the purpose, since I could have just as easily used the wildcard certificate in that case. When I discovered that Grafana Loki expects to be deployed behind a reverse proxy in order to implement access control, I took the opportunity to reevaluate the certificate issuance process. Since a reverse proxy is required to implement the access control I want (anyone can push logs but only authenticated users can query them), it made sense to choose one with native support for requesting certificates via ACME. This would eliminate the need for `fetchcert` and the corresponding Kubernetes API token. Thus, I ended up deciding to redeploy _Step CA_ with the new _DCH CA R3_ for this purpose.	2024-02-22 07:10:01 -06:00
Dustin	4c238a69aa	v-m: Scrape Grafana Loki Grafana Loki is hosted on a VM named loki0.pyrocufflink.blue. It runs Fedora CoreOS, so in addition to scraping Loki itself, we need to scrape _collectd_ and _Zincati_ as well.	2024-02-21 09:16:26 -06:00
Dustin	1777262c15	dch-root-ca: Update to DCH Root CA R3 Since I shut down _step-ca_, nothing uses _DCH Root CA R2_ anymore. I've created a new CA using ED25519 key pairs, named _DCH Root CA R3_.	2024-02-21 09:16:26 -06:00
Dustin	1d2b5260bb	keyserv: Add age key for loki0 This key is used to encrypt the Kubernetes access token for `fetchcert`, which downloads the certificate for Grafana Loki HTTPS.	2024-02-21 09:16:26 -06:00
Dustin	96928a2611	kitchen: Fix weather metrics API URI Apparently, I never bothered to check that the Kitchen HUD server was actually fetching data from Victoria Metrics when I updated it before; I only verified that the Unauthorized errors in the `vmselect` log went away. They did, but only because now the Kitchen server was failing to contact `vmselect` at all.	2024-02-21 08:01:35 -06:00
Dustin	2acefd9a72	v-m: Add alert for sensor battery levels I did not realize the batteries on the garage door tilt sensors had died. Adding alerts for various sensor batteries should help keep me better informed.	2024-02-16 20:56:38 -06:00
Dustin	9784b90743	cert-manager: Remove unused secrets These secrets were used by previous issuers/solvers and are no longer needed.	2024-02-16 20:56:08 -06:00
Dustin	0ad63e0613	authelia: Allow anonymous access to AlertManager Sometimes, I want to be able to look at active alerts without logging in. This rule allows read-only access to the AlertManager UI and API. Unfortunately, the user experience when attempting to create a new Silence using the UI without first logging in is suboptimal, but I think that's worth the trade-off.	2024-02-16 20:41:47 -06:00
Dustin	2f6c358860	invoice-ninja: Update PVC for restored backup The Longhorn volume for the invoice-ninja PVC got into a strange state following an unexpected shutdown this morning. One of its replicas seemed to have disappeared, and it also thought that the size had changed. As such, it got stuck in "expanding" state, but it was not actually being expanded. This issue is described in detail in the Longhorn documentation: [Troubleshooting: Unexpected expansion leads to degradation or attach failure][0]. Unfortunately, there is no way to recover a volume from that state, and it must be deleted and recreated from backup. This changes some of the properties of the PVC, so they need to be updated in the manifest. [0]: https://longhorn.io/kb/troubleshooting-unexpected-expansion-leads-to-degradation-or-attach-failure/	2024-02-15 09:45:57 -06:00
Dustin	80df160ceb	device-plugins: Allow FUSE plugin on Jenkins nodes Jenkins jobs that build container images need access to `/dev/fuse`. Thus, we have to allow Pods managed by the fuse-device-plugin DaemonSet to be scheduled on nodes that are tainted for use exclusively by Jenkins jobs.	2024-02-13 07:56:35 -06:00
Dustin	33fa951c68	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-02-03 09:52:39 -06:00
Dustin	a395d176bc	sshca: Set group principals for Server Admins Members of the Server Admins group need to be able to log in to machines using their respective privileged accounts for e.g. provisioning or emergencies.	2024-02-02 21:02:40 -06:00
Dustin	1f28a623ae	v-m: Do not scrape/alert on Graylog Graylog is down because Elasticsearch corrupted itself again, and this time, I'm just not going to bother fixing it. I practically never use it anymore anyway, and I want to migrate to Grafana Loki, so now seems like a good time to just get rid of it.	2024-02-01 21:45:43 -06:00
Dustin	380af211ec	authelia: Reduce log level	2024-02-01 21:36:27 -06:00
Dustin	94300ac502	kitchen: Use SealedSecret template for config The configuration file for the kitchen HUD server has credentials embedded in it. Until I get around to refactoring it to read these from separate locations, we'll make use of the template feature of SealedSecrets. With this feature, fields can refer to the (decrypted) value of other fields using Go template syntax. This makes it possible to have most of the `config.yaml` document unencrypted and easily modifiable, while still protecting the secrets.	2024-02-01 21:18:46 -06:00
Dustin	baab02217e	authelia: Remove rule for Paperless-ngx API I don't like the [Paperless Mobile][0] app well enough to remove the MFA restriction for the Paperless-ngx API. [0]: https://github.com/astubenbord/paperless-mobile	2024-02-01 21:17:46 -06:00
Dustin	2cd4a8b097	sshca: Configure user CA SSHCA now supports issuing user certificates. It uses OpenID Connect to authenticate requests, and issues certificates based on the user's ID token.	2024-02-01 09:02:11 -06:00
Dustin	834d0f804f	v-m: Scrape Grafana Grafana exports Prometheus metrics about its own performance.	2024-02-01 09:02:01 -06:00
Dustin	3439ce1f13	grafana: Deploy Grafana Now that Victoria Metrics is hosted in Kubernetes, it only makes sense to host Grafana there as well. I chose to use a single-instance deployment for simplicity; I don't really need high availability for Grafana. Its configuration does not change enough to worry about the downtime associated with restarting it. Migrating the existing data from SQLite to PostgreSQL, while possible, is just not worth the hassle.	2024-01-27 22:01:08 -06:00
Dustin	4e15a9d71d	invoice-ninja: Deploy Invoice Ninja Invoice Ninja is a small business management tool. Tabitha wants to use it for HLC. I am a bit concerned about the code quality of this application, and definitely alarmed at the data it send upstream, so I have tried to be extra careful with it. All privileges are revoked, including access to the Internet.	2024-01-27 21:11:26 -06:00
Dustin	a5d186b461	sshca: Add update-machine-ids script The `update-machine-ids.sh` shell script helps update the `sshca-data` SealedSecret with the current contents of the `machine-ids.json` file (stored locally, not tracked in Git).	2024-01-25 20:42:47 -06:00
Dustin	8ae8bad112	v-m: Scrape serial1.p.b	2024-01-25 20:42:07 -06:00
Dustin	7eae328a2c	sshca: Add machine ID for serial1.p.b	2024-01-25 20:41:54 -06:00
Dustin	9fff21aae1	h-a: Remove roomba_is_downstairs template sensor This sensor is now provided by a [Threshold][0] helper. [0]: https://www.home-assistant.io/integrations/threshold/	2024-01-25 17:31:36 -06:00
Dustin	8bb8ed4402	xactfetch: Additional mounts for rbw sync In order to sync the Bitwarden vault, `rbw` needs its configuration file in `/etc/rbw` and access to writable ephemeral storage at `/tmp`.	2024-01-24 12:00:13 -06:00
Dustin	ad37948fe2	v-m: Scrape all metrics components We are now getting metrics from vmstorage, vminsert, vmselect, vmalert, alertmanaer, and blackbox-exporter, in addition to vmagent.	2024-01-23 11:51:50 -06:00
Dustin	bcb588407d	v-m: Correct vmalert remote read/write URLs vmalert has been generating alerts and triggering notifications, but not writing any `ALERTS`/`ALERTS_FOR_STATE` metrics. It turns out this is because I had not correctly configured the remote read/write URLs.	2024-01-23 10:45:40 -06:00
Dustin	9a76a548ec	argocd/app: jenkins: Enable auto sync We're going to try out automatically synchronizing the Jenkins resources when changes are pushed to Git.	2024-01-22 18:50:41 -06:00
Dustin	119a8a74ae	v-m: alerts: Enhance Frigate unavailable alert If Frigate is running but not connected to the MQTT broker, the `sensor.frigate_status` entity will be available, but the `update.frigate_server` entity will not.	2024-01-22 18:27:30 -06:00
Dustin	20ef2a287b	jenkins: Update to 2.426.2	2024-01-22 18:01:03 -06:00
Dustin	fb9ac66ad3	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-01-22 17:55:53 -06:00
Dustin	0e20952740	xactfetch: Sync vault before running The Bitwarden vault needs to be synced before xactfetch runs, in case the password for a bank website has changed since it was first fetched.	2024-01-22 17:52:35 -06:00
Dustin	2f9d8ad618	jenkins: Add CA key to ssh_known_hosts Since (almost) all managed hosts have SSH certificates signed by SSHCA now, the need to maintain a pseudo-dynamic SSH key list is winding down. If we include the SSH CA key in the global known hosts file, and explicitly list the couple of hosts that do not have a certificate, we can let Ansible use that instead of fetching the host keys on each run.	2024-01-22 17:52:35 -06:00
Dustin	3d55d7aafa	keyserv: Add age key for NUT/dustin This key is used to encrypt the password for the NUT user dustin, which I use to manually control the UPS.	2024-01-22 17:52:35 -06:00
Dustin	a7450a8af2	kitchen: Fix Jenkins deployment role Since Jenkins jobs run in Kubernetes now, they can authenticate to the Kubernetes API using a ServiceAccount and do not need a dedicated User.	2024-01-22 17:00:50 -06:00
Dustin	990204b2cf	kitchen: Use Certifi TLS CA bundle for OpenSSL The MQTT client needs a trusted root CA bundle, which is not available in the container image used by the kitchen server (it's based on pythonctnr which literally only includes Python). Fortunately, as it uses OpenSSL under the hood, we can configure it to use the bundle included with the certifi Python package via an environment variable.	2024-01-22 16:57:38 -06:00
Dustin	9b441738d4	dch-webhooks: Disable HTTPS redirect The [Generic Event][0] plugin for Jenkins does not support HTTPS webhooks, only plain HTTP. [0]: https://plugins.jenkins.io/generic-event/	2024-01-22 16:55:03 -06:00
Dustin	54e7a25f93	v-m: vmstorage: Remove startup/ready probes Kubernetes will not start additional Pods in a StatefulSet until the existing ones are Ready. This means that if there is a problem bringing up, e.g. `vmstorage-0`, it will never start `vmstorage-1` or `vmstorage-2`. Since this pretty much defeats the purpose of having a multi-node `vmstorage` cluster, we have to remove the readiness probe, so the Pods will be Ready as soon as they start. If there is a problem with one of them, it will matter less, as the others can still run.	2024-01-22 16:43:46 -06:00
Dustin	ca02dfec62	v-m: Add host labels to collectd-virt metrics The virt plugin for collectd sets `instance` to the name of the libvirt domain the metric refers to. This makes it so there is no label identifying which host the VM is running on. Thus, if we want to classify metrics by VM host, we need to add that label explicitly. Since the `__address__` label is not available during metric relabeling, we need to store it in a temporary label, which gets dropped at the end of the relabeling phase. We copy the value of that label into a new label, but only for metrics that match the desired metric name.	2024-01-22 11:12:19 -06:00
Dustin	832dea2c7d	h-a: Add init container to wait for PostgreSQL When Home Assistant starts, if PostgreSQL is unavailable, it will come up successfully, but without the history component. It never tries again to connect and enable the component. This makes it difficult to detect the problem and thus easy to overlook the missing functionality. To avoid having extended periods of missing state history, we'll force Home Assistant to wait for PostgreSQL to come up before starting.	2024-01-21 19:50:54 -06:00
Dustin	50beecf0a9	h-a: Increase startup probe failure threshold Home Assistant can sometimes tke an unexpectedly long time to start up, but it eventually does.	2024-01-21 19:32:35 -06:00
Dustin	cb39b5a547	h-a: Update mobile apps notification group Updating the notification group for the family's new mobile devices.	2024-01-21 19:30:50 -06:00
Dustin	534c4bfca0	keyserv: Deploy keyserv `keyserv` is a little utility I wrote to dispense age keys to clients. It uses SSH certificates for authentication. If the client presents an SSH certificate signed by a trusted key, the server will return all the keys the principal(s) listed in the certificate are allowed to use. The response is encrypted with the public key from the certificate, so the client must have access to the corresponding private key in order to read the response. I am currently using this server to provide keys for the new configuration policy. The keys herein are used to encrypt NUT monitor passwords.	2024-01-19 22:08:25 -06:00
Dustin	897923a172	authelia: Bypass Authelia for Paperless-ngx API The [Paperless Mobile][0] app for Android uses the Paperless-ngx API. [0]: https://github.com/astubenbord/paperless-mobile/	2024-01-19 13:42:03 -06:00
Dustin	5f24ca0ad2	Merge branch 'rosalina/master'	2024-01-15 19:19:43 -06:00
Dustin	51775ede81	v-m/vmagent: Scrape nut0 nut0.pyrocufflink.blue is the new UPS monitor server. It runs Fedora CoreOS, with NUT in a container.	2024-01-15 18:46:46 -06:00
Dustin	90b293d5c8	v-m/vmagent: Scrape k8s-amd64-n3	2024-01-15 18:45:52 -06:00

1 2 3 4 5 ...

261 Commits (31345bee7becc1d69c7f56677759527e6b29c26d) All Branches Search

261 Commits (31345bee7becc1d69c7f56677759527e6b29c26d)

All Branches