infra/cfg - cfg - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Dustin	ee66e9ea18	caddy: Separate out from loki app This will make it more clear when sharing Caddy resources with other applications (e.g. Frigate).	2024-04-05 22:05:21 -05:00
Dustin	d432c673e9	host: Add nvr2.p.b nvr2.pyrocufflink.blue runs Frigate video recording software.	2024-04-05 22:05:21 -05:00
Dustin	d989994f25	serterm: Deploy serial terminal server The serial terminal server ("serterm") is a collection of scripts that automate launching multiple `picocom` processes, one per USB-serial adapter connected to the system. Each `picocom` process has its own window in a `tmux` session, which is accessible via SSH on a dedicated port (20022). Clients connecting to that SSH server will be automatically attached to the `tmux` session, allowing them to access the serial terminal server quickly and easily. The SSH server only allows public-key authentication, so the authorized keys have to be pre-configured. In addition to automatically launching `picocom` windows for each serial port when the terminal server starts, ports that are added (hot-plugged) while the server is running will have windows created for them automatically, by way of a udev rule. Each `picocom` process is configured to log communications with its respective serial port. This may be useful, for example, to find diagnostic messages that may not be captured by the `tmux` scrollback buffer.	2024-03-21 21:24:12 -05:00
Dustin	878ff7acb5	loki: Deploy Caddy in front of Loki Grafana Loki explicitly eschews built-in authentication. In fact, its [documentation][0] states: > Operators are expected to run an authenticating reverse proxy in front > of your services. While I don't really want to require authentication for agents sending logs, I definitely want to restrict querying and viewing logs to trusted users. There are _many_ reverse proxy servers available, and normally I would choose _nginx_. In this case, though, I decided to try Caddy, mostly because of its built-in ACME support. I wasn't really happy with how the `fetchcert` system turned out, particularly using the Kubernetes API token for authentication. Since the token will eventually expire, it will require manual intervention to renew, thus mostly defeating the purpose of having an auto-renewing certificate. So instead of using _cert-manager_ to issue the certificate and store it in Kubernetes, and then having `fetchcert` download it via the Kubernetes API, I set up _step-ca_ to handle issuing the certificate directly to the server. When Caddy starts up, it contacts _step-ca_ via ACME and handles the challenge verification automatically. Further, it will automatically renew the certificate as necessary, again using ACME. I didn't spend a lot of time optimizing the Caddy configuration, so there's some duplication there (i.e. the multiple `reverse_proxy` statements), but the configuration works as desired. Clients may provide a certificate, which will be verified against the trusted issuer CA. If the certificate is valid, the client may access any Loki resource. Clients that do not provide a certificate can only access the ingestion path, as well as the "ready" and "metrics" resources. [0]: https://grafana.com/docs/loki/latest/operations/authentication/	2024-02-21 07:47:51 -06:00
Dustin	ae948489e3	Deploy Promtail to all non-Kubernetes nodes All the stand-alone FCOS hosts now have Promtail running, forwarding _systemd_ journal messages to Grafana Loki. The Kubernetes nodes will have Promtail deployed as a Kubernetes pod. I would really like to come up with a way to define variables for groups of hosts, so that I do not have to add `promtail: prod.#promtail` to every host's values file individually...	2024-02-18 12:59:14 -06:00
Dustin	45c35c065a	promtail: Deploy Loki Promtail Agent [Promtail][0] is the log collection agent for Grafana Loki. It reads logs from various locations, including local files and the _systemd_ journal and sends them to Loki via HTTP. Loki configuration is a highly-structured YAML document. Thus, instead of using Tera template syntax for loops, conditionals, etc., we can use the full power of CUE to construct the configuration. Using the `Marshal` function from the built-in `encoding/yaml` package, we serialize the final configuration structure as a string and write it verbatim to the configuration file. I have modeled most of the Promtail configuration schema in the `du5t1n.me/cfg/app/promtail/schema` package. Having the schema modeled will ensure the generated configuration is valid during development (i.e. `cue export` will fail if it is not), which will save time pushing changes to machines and having Loki complain. The `#promtail` "function" in `du5t1n.me/cfg/env/prod` makes it easy to build our desired configuration. It accepts an optional `#scrape` field, which can be used to provide specific log scraping definitions. If it is unspecified, the default configuration is to scrape the systemd journal. Hosts with additional needs can supply their own list, probably including the `promtail.scrape.journal` object in it to get the default journal scrape job. [0]: https://grafana.com/docs/loki/latest/send-data/promtail/	2024-02-18 11:35:13 -06:00
Dustin	011058aec3	loki: Use fetchcert to manage server certificate Before going into production with Grafana Loki, I want to set it up to use TLS. To that end, I have configured _cert-manager_ to issue it a certificate, signed by _DCH CA_. In order to use said certificate, we need to configure `fetchcert` to run on the Loki server.	2024-02-18 11:35:13 -06:00
Dustin	45285b9c47	host: Add loki0.p.b loki0.pyrocufflink.blue will host [Grafana Loki][0], a log aggregation system. [0]: https://grafana.com/oss/loki/	2024-02-13 16:55:05 -06:00
Dustin	1738e4a1f1	host: Add k8s-aarch64-n{0,1}	2024-02-03 11:16:52 -06:00
Dustin	b7f5d4a910	app/ssh: Configure sshd trusted user CA keys Configuring the system-wide trusted user CA key list for sshd(8).	2024-02-03 11:16:52 -06:00
Dustin	afd65ea9b8	host/nvr1: Fix cue package name	2024-02-03 11:13:42 -06:00
Dustin	073f7a6845	host: Add k8s-amd64-n3 k8s-amd64-n3.pyrocufflink.blue is a Kubernetes worker node.	2024-02-03 11:12:55 -06:00
Dustin	f886a1bd8a	sudo: Configure pam_ssh_agent_auth I do not like how Fedora CoreOS configures `sudo` to allow the core user to run privileged processes without authentication. Rather than assign the user a password, which would then have to be stored somewhere, we'll install pam_ssh_agent_auth and configure `sudo` to use it for authentication. This way, only users with the private key corresponding to one of the configured public keys can run `sudo`. Naturally, pam_ssh_agent_auth has to be installed on the host system. We achieve this by executing `rpm-ostree` via `nsenter` to escape the container. Once it is installed, we configure the PAM stack for `sudo` to use it and populate the authorized keys database. We also need to configure `sudo` to keep the `SSH_AUTH_SOCK` environment variable, so pam_ssh_agent_auth knows where to look for the private keys. Finally, we disable the default NOPASSWD rule for `sudo`, if and only if the new configuration was installed.	2024-01-29 09:10:42 -06:00
Dustin	bd18d3a734	host: Add serial1.p.b serial1.pyrocufflink.blue is a replacement for serial0.p.b. It runs Fedora CoreOS and just has `picocom` and `tmux`.	2024-01-25 20:17:00 -06:00
Dustin	36fd137897	nut: Infer role from server name, set commands Since the "primary" `upsmon` is always (for our purposes) running on the same host as `upsd`, there's no reason to specify both values. All systems need a shutdown command; one is not set by default. The primary system is the only one that should send notifications.	2024-01-19 17:57:20 -06:00
Dustin	ad42c2d883	nvr1: Add instructions to configure upsmon nvr1.pyrocufflink.blue will run `upsmon` so it can shut itself down safely when the power goes out.	2024-01-19 16:57:47 -06:00
Dustin	fb74f0e81c	nut: Configure upsmon `upsmon` is the component of NUT that tracks the status of UPSs and reacts to their changing by sending notifications and/or shutting down the system. It is a networked application that can run on any system; it can run on a different system than `upsd`, and indeed can run on multiple systems simultaneously. Each system that runs `upsmon` will need a username and password for each UPS it will monitor. Using the CUE [function pattern][0], I've made it pretty simple to declare the necessary values under `nut.monitor`. [0]: https://cuetorials.com/patterns/functions/	2024-01-19 08:52:14 -06:00
Dustin	52642d37d9	nut: Configure collectd NUT plugin infra/cfg/pipeline/head This commit looks good Details	2024-01-17 07:18:37 -06:00
Dustin	37d65984c7	host/nut0: Switch to prod configuration infra/cfg/pipeline/head This commit looks good Details	2024-01-15 16:15:47 -06:00
Dustin	11f9957c11	Switch from KCL to CUE Although KCL is unquestionably a more powerful language, and maps more closely to my mental model of how host/environment/application configuration is defined, the fact that it doesn't work on ARM (issue 982]) makes it a non-starter. It's also quite slow (owing to how it compiles a program to evaluate the code) and cumbersome to distribute. Fortunately, `tmpl` doesn't care how the values it uses were computed, so we freely change configuration languages, so long as whatever we use generates JSON/YAML. CUE is probably a lot more popular than KCL, and is quite a bit simpler. It's more restrictive (values cannot be overridden once defined), but still expressive enough for what I am trying to do (so far).	2024-01-15 11:40:58 -06:00
Dustin	74508faf27	nut: Apply udev rules on the host NUT needs some udev rules in order to set the proper permissions on USB etc. devices so it can run as an otherwise unprivileged user. Since udev rules can only be processed on the host, these rules need to be copied out of the container and evaluated before the NUT server starts. To enable this, the nut-server container image copies the rules it contains to `/etc/udev/rules.d` if that directory is a mount point. By bind mounting a directory on the host at that path, we can get a copy of the rules files outside the container. Then, using a systemd path unit, we can tell the udev daemon to reload and reevaluate its rules. SELinux prevents processes in containers from writing to `/etc/udev/rules.d` directly, so we have to use an intermediate location and then copy the rules files to their final destination.	2024-01-14 19:24:55 -06:00
Dustin	778c6d440d	Initial commit	2024-01-14 19:24:55 -06:00

22 Commits (ee66e9ea18349bd4c9227d3847673e65f4128ef9)