Commit Graph

18 Commits (ae948489e3e7d0bc2c04696ae7d500d83f2143e3)

Author SHA1 Message Date
Dustin ae948489e3 Deploy Promtail to all non-Kubernetes nodes
All the stand-alone FCOS hosts now have Promtail running, forwarding
_systemd_ journal messages to Grafana Loki.  The Kubernetes nodes will
have Promtail deployed as a Kubernetes pod.

I would really like to come up with a way to define variables for groups
of hosts, so that I do not have to add `promtail: prod.#promtail` to
every host's values file individually...
2024-02-18 12:59:14 -06:00
Dustin 45c35c065a promtail: Deploy Loki Promtail Agent
[Promtail][0] is the log collection agent for Grafana Loki.  It reads
logs from various locations, including local files and the _systemd_
journal and sends them to Loki via HTTP.

Loki configuration is a highly-structured YAML document.  Thus, instead
of using Tera template syntax for loops, conditionals, etc., we can use
the full power of CUE to construct the configuration.  Using the
`Marshal` function from the built-in `encoding/yaml` package, we
serialize the final configuration structure as a string and write it
verbatim to the configuration file.

I have modeled most of the Promtail configuration schema in the
`du5t1n.me/cfg/app/promtail/schema` package.  Having the schema modeled
will ensure the generated configuration is valid during development
(i.e. `cue export` will fail if it is not), which will save time pushing
changes to machines and having Loki complain.

The `#promtail` "function" in `du5t1n.me/cfg/env/prod` makes it easy to
build our desired configuration.  It accepts an optional `#scrape`
field, which can be used to provide specific log scraping definitions.
If it is unspecified, the default configuration is to scrape the systemd
journal.  Hosts with additional needs can supply their own list,
probably including the `promtail.scrape.journal` object in it to get the
default journal scrape job.

[0]: https://grafana.com/docs/loki/latest/send-data/promtail/
2024-02-18 11:35:13 -06:00
Dustin 011058aec3 loki: Use fetchcert to manage server certificate
Before going into production with Grafana Loki, I want to set it up to
use TLS.  To that end, I have configured _cert-manager_ to issue it a
certificate, signed by _DCH CA_.  In order to use said certificate,
we need to configure `fetchcert` to run on the Loki server.
2024-02-18 11:35:13 -06:00
Dustin 45285b9c47 host: Add loki0.p.b
*loki0.pyrocufflink.blue* will host [Grafana Loki][0], a log aggregation
system.

[0]: https://grafana.com/oss/loki/
2024-02-13 16:55:05 -06:00
Dustin 1738e4a1f1 host: Add k8s-aarch64-n{0,1} 2024-02-03 11:16:52 -06:00
Dustin b7f5d4a910 app/ssh: Configure sshd trusted user CA keys
Configuring the system-wide trusted user CA key list for *sshd(8)*.
2024-02-03 11:16:52 -06:00
Dustin afd65ea9b8 host/nvr1: Fix cue package name 2024-02-03 11:13:42 -06:00
Dustin 073f7a6845 host: Add k8s-amd64-n3
*k8s-amd64-n3.pyrocufflink.blue* is a Kubernetes worker node.
2024-02-03 11:12:55 -06:00
Dustin f886a1bd8a sudo: Configure pam_ssh_agent_auth
I do not like how Fedora CoreOS configures `sudo` to allow the *core*
user to run privileged processes without authentication.  Rather than
assign the user a password, which would then have to be stored
somewhere, we'll install *pam_ssh_agent_auth* and configure `sudo` to
use it for authentication.  This way, only users with the private key
corresponding to one of the configured public keys can run `sudo`.

Naturally, *pam_ssh_agent_auth* has to be installed on the host system.
We achieve this by executing `rpm-ostree` via `nsenter` to escape the
container.  Once it is installed, we configure the PAM stack for
`sudo` to use it and populate the authorized keys database.  We also
need to configure `sudo` to keep the `SSH_AUTH_SOCK` environment
variable, so *pam_ssh_agent_auth* knows where to look for the private
keys.  Finally, we disable the default NOPASSWD rule for `sudo`, if
and only if the new configuration was installed.
2024-01-29 09:10:42 -06:00
Dustin bd18d3a734 host: Add serial1.p.b
*serial1.pyrocufflink.blue* is a replacement for *serial0.p.b*.  It runs
Fedora CoreOS and just has `picocom` and `tmux`.
2024-01-25 20:17:00 -06:00
Dustin 36fd137897 nut: Infer role from server name, set commands
Since the "primary" `upsmon` is always (for our purposes) running on the
same host as `upsd`, there's no reason to specify both values.

All systems need a shutdown command; one is not set by default.

The primary system is the only one that should send notifications.
2024-01-19 17:57:20 -06:00
Dustin ad42c2d883 nvr1: Add instructions to configure upsmon
*nvr1.pyrocufflink.blue* will run `upsmon` so it can shut itself down
safely when the power goes out.
2024-01-19 16:57:47 -06:00
Dustin fb74f0e81c nut: Configure upsmon
`upsmon` is the component of NUT that tracks the status of UPSs and
reacts to their changing by sending notifications and/or shutting down
the system.  It is a networked application that can run on any system;
it can run on a different system than `upsd`, and indeed can run on
multiple systems simultaneously.

Each system that runs `upsmon` will need a username and password for
each UPS it will monitor.  Using the CUE [function pattern][0], I've
made it pretty simple to declare the necessary values under
`nut.monitor`.

[0]: https://cuetorials.com/patterns/functions/
2024-01-19 08:52:14 -06:00
Dustin 52642d37d9 nut: Configure collectd NUT plugin
infra/cfg/pipeline/head This commit looks good Details
2024-01-17 07:18:37 -06:00
Dustin 37d65984c7 host/nut0: Switch to prod configuration
infra/cfg/pipeline/head This commit looks good Details
2024-01-15 16:15:47 -06:00
Dustin 11f9957c11 Switch from KCL to CUE
Although KCL is unquestionably a more powerful language, and maps more
closely to my mental model of how host/environment/application
configuration is defined, the fact that it doesn't work on ARM (issue
982]) makes it a non-starter.  It's also quite slow (owing to how it
compiles a program to evaluate the code) and cumbersome to distribute.
Fortunately, `tmpl` doesn't care how the values it uses were computed,
so we freely change configuration languages, so long as whatever we use
generates JSON/YAML.

CUE is probably a lot more popular than KCL, and is quite a bit simpler.
It's more restrictive (values cannot be overridden once defined), but
still expressive enough for what I am trying to do (so far).
2024-01-15 11:40:58 -06:00
Dustin 74508faf27 nut: Apply udev rules on the host
NUT needs some udev rules in order to set the proper permissions on USB
etc. devices so it can run as an otherwise unprivileged user.  Since
udev rules can only be processed on the host, these rules need to be
copied out of the container and evaluated before the NUT server starts.
To enable this, the *nut-server* container image copies the rules it
contains to `/etc/udev/rules.d` if that directory is a mount point.  By
bind mounting a directory on the host at that path, we can get a copy of
the rules files outside the container.  Then, using a systemd path unit,
we can tell the udev daemon to reload and reevaluate its rules.

SELinux prevents processes in containers from writing to
`/etc/udev/rules.d` directly, so we have to use an intermediate location
and then copy the rules files to their final destination.
2024-01-14 19:24:55 -06:00
Dustin 778c6d440d Initial commit 2024-01-14 19:24:55 -06:00