1
0
Fork 0
Commit Graph

220 Commits (680709e670c0f1905badbb39b6ace5af6aac156c)

Author SHA1 Message Date
Dustin 680709e670 authelia: Add auth rule for HLC forms submit
The *hlcforms* application handles form submissions for the Hatch
Learning Center website.  It has various features for Tabitha that are
only accessible internally, but the form submission handler itself of
course needs to be accessible anonymously.
2024-03-25 08:43:55 -05:00
Dustin c7223ff4fd authelia: Enable dark theme
A recent version of *Authelia* added a dark theme.  Setting the `theme`
option to `auto` enables it when the user agent has the "prefers dark
mode" hint enabled.
2024-02-27 06:51:14 -06:00
Dustin de72776e73 v-m: Scrape metrics from Authelia
Authelia exposes Prometheus metrics from a different server socket,
which is not enabled by default.
2024-02-27 06:41:52 -06:00
Dustin e0b2b3f5ae v-m: Scrape metrics from Patroni
Patroni, a component of the *postgres poerator*, exports metrics about
the PostgreSQL database servers it manages.  Notably, it provides
information about the current transaction log location for each server.
This allows us to monitor and alert on the health of database replicas.
2024-02-24 08:33:52 -06:00
Dustin 2442835edd autoscaler: Add SealedSecret for AWS key 2024-02-22 09:59:16 -06:00
Dustin 83eeb46c93 v-m: Scrape Argo CD
*Argo CD* exposes metrics about itself and the applications it manages.
Notibly, this can be useful for monitoring application health.
2024-02-22 07:10:01 -06:00
Dustin 465f121e61 v-m: Scrape Promtail
The *promtail* job scrapes metrics from all the hosts running Promtail.
The static targets are Fedora CoreOS nodes that are not part of the
Kubernetes cluster.

The relabeling rules ensure that both the static targets and the
targets discovered via the Kubernetes Node API use the FQDN of the host
as the value of the *instance* label.
2024-02-22 07:10:01 -06:00
Dustin 815eefdcf9 promtail: Deploy as DaemonSet
Running Promtail in a pod controlled by a DaemonSet allows it to access
the Kubernetes API via a ServiceAccount token.  Since it needs the API
in order to discover the Pods running on the current node in order to
find their log files, this makes the authentication process a lot
simpler.
2024-02-22 07:10:01 -06:00
Dustin 5e4ab1d988 v-m: Update Loki scrape target
Now that Loki uses Caddy as a reverse proxy, we need to update the
scrape target to point to the correct port (443).
2024-02-22 07:10:01 -06:00
Dustin f468977d91 grafana: Enable send_user_header option
I discovered today that if anonymous Grafana users have Viewer
permission, they can use the Datasource API to make arbitrary queries
to any backend, even if they cannot access the Explore page directly.
This is documented ([issue #48313][0]) as expected behavior.

I don't really mind giving anonymous access to the Victoria Metrics
datasource, but I definitely don't want anonymous users to be able to
make Loki queries and view log data.  Since Grafana Datasource
Permissions is limited to Grafana Enterprise and not available in
the open source version of Grafana, the official recommendation from
upstream is to use a separate Organization for the Loki datasource.
Unfortunately, this would preclude having dashboards that have graphs
from both data sources.  Although I don't have any of those right now, I
like the idea and may build some eventually.

Fortunately, I discovered the `send_user_header` Grafana configuration
option.  With this enabled, Grafana will send an `X-Grafana-User` header
with the username of the user on whose behalf it is making a request to
the backend.  If the user is not logged in, it does not send the header.
Thus, we can detect the presence of this header on the backend and
refuse to serve query requests if it is missing.

[0]: https://github.com/grafana/grafana/issues/48313
2024-02-22 07:10:01 -06:00
Dustin 35ff500812 grafana: Configure Loki datastore
Usually, Grafana datastores are configured using its web GUI.  When
setting up a datastore that requires TLS client authentication, the
client certificate and private key have to be pasted into the form.
For certificates that renew frequently, this method would require a
frequent manual effort.  Fortunately, Grafana supports defining
datastores via its "provisioning" mechanism, reading the configuration
from YAML files on the filesystem.
2024-02-22 07:10:01 -06:00
Dustin d4efb735bf loki-ca: Add cert-manager issuer for Loki CA
The Loki CA is used to issue client certificates for Grafana Loki.  This
_cert-manager_ ClusterIssuer will allow applications running in
Kubernetes (e.g. Grafana) to request a Certificate that they can use to
access the Loki HTTP API.
2024-02-22 07:10:01 -06:00
Dustin d08cc6fb0f step-ca: Redeploy with DCH CA R3
I never ended up using _Step CA_ for anything, since I was initially
focused on the SSH CA feature and I was unhappy with how it worked
(which led me to write _SSHCA_).  I didn't think about it much until I
was working on deploying Grafana Loki.  For that project, I wanted to
use a certificate signed by a private CA instead of the wildcard
certificate for _pyrocufflink.blue_.  So, I created *DCH CA R3* for that
purpose.  Then, for some reason, I used the exact same procedure to
fetch the certificate from Kubernetes as I had set up for the
_pyrocufflink.blue_ wildcard certificate, as used by Frigate.  This of
course defeated the purpose, since I could have just as easily used
the wildcard certificate in that case.

When I discovered that Grafana Loki expects to be deployed behind a
reverse proxy in order to implement access control, I took the
opportunity to reevaluate the certificate issuance process.  Since a
reverse proxy is required to implement the access control I want (anyone
can push logs but only authenticated users can query them), it made
sense to choose one with native support for requesting certificates via
ACME.  This would eliminate the need for `fetchcert` and the
corresponding Kubernetes API token.  Thus, I ended up deciding to
redeploy _Step CA_ with the new _DCH CA R3_ for this purpose.
2024-02-22 07:10:01 -06:00
Dustin 4c238a69aa v-m: Scrape Grafana Loki
Grafana Loki is hosted on a VM named *loki0.pyrocufflink.blue*.  It runs
Fedora CoreOS, so in addition to scraping Loki itself, we need to scrape
_collectd_ and _Zincati_ as well.
2024-02-21 09:16:26 -06:00
Dustin 1777262c15 dch-root-ca: Update to DCH Root CA R3
Since I shut down _step-ca_, nothing uses _DCH Root CA R2_ anymore.
I've created a new CA using ED25519 key pairs, named _DCH Root CA R3_.
2024-02-21 09:16:26 -06:00
Dustin 1d2b5260bb keyserv: Add age key for loki0
This key is used to encrypt the Kubernetes access token for `fetchcert`,
which downloads the certificate for Grafana Loki HTTPS.
2024-02-21 09:16:26 -06:00
Dustin 96928a2611 kitchen: Fix weather metrics API URI
Apparently, I never bothered to check that the Kitchen HUD server was
actually fetching data from Victoria Metrics when I updated it before; I
only verified that the Unauthorized errors in the `vmselect` log
went away.  They did, but only because now the Kitchen server was
failing to contact `vmselect` at all.
2024-02-21 08:01:35 -06:00
Dustin 2acefd9a72 v-m: Add alert for sensor battery levels
I did not realize the batteries on the garage door tilt sensors had
died.  Adding alerts for various sensor batteries should help keep me
better informed.
2024-02-16 20:56:38 -06:00
Dustin 9784b90743 cert-manager: Remove unused secrets
These secrets were used by previous issuers/solvers and are no longer
needed.
2024-02-16 20:56:08 -06:00
Dustin 0ad63e0613 authelia: Allow anonymous access to AlertManager
Sometimes, I want to be able to look at active alerts without logging
in.  This rule allows read-only access to the AlertManager UI and API.
Unfortunately, the user experience when attempting to create a new
Silence using the UI without first logging in is suboptimal, but I think
that's worth the trade-off.
2024-02-16 20:41:47 -06:00
Dustin 2f6c358860 invoice-ninja: Update PVC for restored backup
The Longhorn volume for the *invoice-ninja* PVC got into a strange state
following an unexpected shutdown this morning.  One of its replicas
seemed to have disappeared, and it also thought that the size had
changed.  As such, it got stuck in "expanding" state, but it was not
actually being expanded.  This issue is described in detail in the
Longhorn documentation: [Troubleshooting: Unexpected expansion leads to
degradation or attach failure][0].  Unfortunately, there is no way to
recover a volume from that state, and it must be deleted and recreated
from backup.  This changes some of the properties of the PVC, so they
need to be updated in the manifest.

[0]: https://longhorn.io/kb/troubleshooting-unexpected-expansion-leads-to-degradation-or-attach-failure/
2024-02-15 09:45:57 -06:00
Dustin 80df160ceb device-plugins: Allow FUSE plugin on Jenkins nodes
Jenkins jobs that build container images need access to `/dev/fuse`.
Thus, we have to allow Pods managed by the *fuse-device-plugin*
DaemonSet to be scheduled on nodes that are tainted for use exclusively
by Jenkins jobs.
2024-02-13 07:56:35 -06:00
Dustin 33fa951c68 Merge remote-tracking branch 'refs/remotes/origin/master' 2024-02-03 09:52:39 -06:00
Dustin a395d176bc sshca: Set group principals for Server Admins
Members of the *Server Admins* group need to be able to log in to
machines using their respective privileged accounts for e.g.
provisioning or emergencies.
2024-02-02 21:02:40 -06:00
Dustin 1f28a623ae v-m: Do not scrape/alert on Graylog
Graylog is down because Elasticsearch corrupted itself again, and this
time, I'm just not going to bother fixing it.  I practically never use
it anymore anyway, and I want to migrate to Grafana Loki, so now seems
like a good time to just get rid of it.
2024-02-01 21:45:43 -06:00
Dustin 380af211ec authelia: Reduce log level 2024-02-01 21:36:27 -06:00
Dustin 94300ac502 kitchen: Use SealedSecret template for config
The configuration file for the kitchen HUD server has credentials
embedded in it.  Until I get around to refactoring it to read these from
separate locations, we'll make use of the template feature of
SealedSecrets.  With this feature, fields can refer to the (decrypted)
value of other fields using Go template syntax.  This makes it possible
to have most of the `config.yaml` document unencrypted and easily
modifiable, while still protecting the secrets.
2024-02-01 21:18:46 -06:00
Dustin baab02217e authelia: Remove rule for Paperless-ngx API
I don't like the [Paperless Mobile][0] app well enough to remove the MFA
restriction for the Paperless-ngx API.

[0]: https://github.com/astubenbord/paperless-mobile
2024-02-01 21:17:46 -06:00
Dustin 2cd4a8b097 sshca: Configure user CA
SSHCA now supports issuing user certificates.  It uses OpenID Connect to
authenticate requests, and issues certificates based on the user's ID
token.
2024-02-01 09:02:11 -06:00
Dustin 834d0f804f v-m: Scrape Grafana
Grafana exports Prometheus metrics about its own performance.
2024-02-01 09:02:01 -06:00
Dustin 3439ce1f13 grafana: Deploy Grafana
Now that Victoria Metrics is hosted in Kubernetes, it only makes sense
to host Grafana there as well.  I chose to use a single-instance
deployment for simplicity; I don't really need high availability for
Grafana.  Its configuration does not change enough to worry about the
downtime associated with restarting it.  Migrating the existing data
from SQLite to PostgreSQL, while possible, is just not worth the hassle.
2024-01-27 22:01:08 -06:00
Dustin 4e15a9d71d invoice-ninja: Deploy Invoice Ninja
Invoice Ninja is a small business management tool.  Tabitha wants to
use it for HLC.

I am a bit concerned about the code quality of this application, and
definitely alarmed at the data it send upstream, so I have tried to be
extra careful with it.  All privileges are revoked, including access to
the Internet.
2024-01-27 21:11:26 -06:00
Dustin a5d186b461 sshca: Add update-machine-ids script
The `update-machine-ids.sh` shell script helps update the `sshca-data`
SealedSecret with the current contents of the `machine-ids.json` file
(stored locally, not tracked in Git).
2024-01-25 20:42:47 -06:00
Dustin 8ae8bad112 v-m: Scrape serial1.p.b 2024-01-25 20:42:07 -06:00
Dustin 7eae328a2c sshca: Add machine ID for serial1.p.b 2024-01-25 20:41:54 -06:00
Dustin 9fff21aae1 h-a: Remove roomba_is_downstairs template sensor
This sensor is now provided by a [Threshold][0] helper.

[0]: https://www.home-assistant.io/integrations/threshold/
2024-01-25 17:31:36 -06:00
Dustin 8bb8ed4402 xactfetch: Additional mounts for rbw sync
In order to sync the Bitwarden vault, `rbw` needs its configuration file
in `/etc/rbw` and access to writable ephemeral storage at `/tmp`.
2024-01-24 12:00:13 -06:00
Dustin ad37948fe2 v-m: Scrape all metrics components
We are now getting metrics from *vmstorage*, *vminsert*, *vmselect*,
*vmalert*, *alertmanaer*, and *blackbox-exporter*, in addition to
*vmagent*.
2024-01-23 11:51:50 -06:00
Dustin bcb588407d v-m: Correct vmalert remote read/write URLs
*vmalert* has been generating alerts and triggering notifications, but
not writing any `ALERTS`/`ALERTS_FOR_STATE` metrics.  It turns out this
is because I had not correctly configured the remote read/write
URLs.
2024-01-23 10:45:40 -06:00
Dustin 9a76a548ec argocd/app: jenkins: Enable auto sync
We're going to try out automatically synchronizing the Jenkins resources
when changes are pushed to Git.
2024-01-22 18:50:41 -06:00
Dustin 119a8a74ae v-m: alerts: Enhance Frigate unavailable alert
If Frigate is running but not connected to the MQTT broker, the
`sensor.frigate_status` entity will be available, but the
`update.frigate_server` entity will not.
2024-01-22 18:27:30 -06:00
Dustin 20ef2a287b jenkins: Update to 2.426.2 2024-01-22 18:01:03 -06:00
Dustin fb9ac66ad3 Merge remote-tracking branch 'refs/remotes/origin/master' 2024-01-22 17:55:53 -06:00
Dustin 0e20952740 xactfetch: Sync vault before running
The Bitwarden vault needs to be synced before *xactfetch* runs, in case
the password for a bank website has changed since it was first fetched.
2024-01-22 17:52:35 -06:00
Dustin 2f9d8ad618 jenkins: Add CA key to ssh_known_hosts
Since (almost) all managed hosts have SSH certificates signed by SSHCA
now, the need to maintain a pseudo-dynamic SSH key list is winding down.
If we include the SSH CA key in the global known hosts file, and
explicitly list the couple of hosts that do not have a certificate, we
can let Ansible use that instead of fetching the host keys on each run.
2024-01-22 17:52:35 -06:00
Dustin 3d55d7aafa keyserv: Add age key for NUT/dustin
This key is used to encrypt the password for the NUT user *dustin*,
which I use to manually control the UPS.
2024-01-22 17:52:35 -06:00
Dustin a7450a8af2 kitchen: Fix Jenkins deployment role
Since Jenkins jobs run in Kubernetes now, they can authenticate to the
Kubernetes API using a ServiceAccount and do not need a dedicated
User.
2024-01-22 17:00:50 -06:00
Dustin 990204b2cf kitchen: Use Certifi TLS CA bundle for OpenSSL
The MQTT client needs a trusted root CA bundle, which is not available
in the container image used by the *kitchen* server (it's based on
*pythonctnr* which literally *only* includes Python).  Fortunately, as
it uses OpenSSL under the hood, we can configure it to use the bundle
included with the *certifi* Python package via an environment variable.
2024-01-22 16:57:38 -06:00
Dustin 9b441738d4 dch-webhooks: Disable HTTPS redirect
The [Generic Event][0] plugin for Jenkins does not support HTTPS
webhooks, only plain HTTP.

[0]: https://plugins.jenkins.io/generic-event/
2024-01-22 16:55:03 -06:00
Dustin 54e7a25f93 v-m: vmstorage: Remove startup/ready probes
Kubernetes will not start additional Pods in a StatefulSet until the
existing ones are Ready.  This means that if there is a problem bringing
up, e.g. `vmstorage-0`, it will never start `vmstorage-1` or
`vmstorage-2`.  Since this pretty much defeats the purpose of having a
multi-node `vmstorage` cluster, we have to remove the readiness probe,
so the Pods will be Ready as soon as they start.  If there is a problem
with one of them, it will matter less, as the others can still run.
2024-01-22 16:43:46 -06:00