configpolicy

Author	SHA1	Message	Date
Dustin C. Hatch	5dba0aec8f	fluent-bit: create configmap for kubernetes nodes The last step in replacing Loki with Victoria Logs is to ingest logs from Kubernetes pods. Like Promtail, Fluent Bit is capable of augmenting log records with Kubernetes metadata, so we can search for logs by pod name, namespace, etc. This of course requires access to the Kubernetes API, and the easiest way to provide that is to run Fluent Bit as a Kubernetes pod, granting its service account the appropriate permissions. Since Fluent Bit also collects logs from the systemd journal, I want to make sure the configuration for that function stays the same on Kubernetes nodes as on all other servers. One way to do that would be to run two different instances of Fluent Bit: one managed by Ansible that collects journal messages, and another managed by Kubernetes that collects pod logs. This seems like unnecessary overhead, so I have chosen a hybrid approach. Ansible manages the configuration for the process running in Kubernetes.	2025-12-04 21:26:03 -06:00
Dustin C. Hatch	719be9a4e9	Deploy Radarr, Sonarr, Prowlarr on file0.p.b I had originally intended to deploy Radarr, Sonarr, and Prowlarr on Kubernetes. Unfortunately, this turned out to be problematic, as I would need a way to share the download directory between Radarr/Sonar and Aria2, and the media directory between Radarr/Sonarr and Jellyfin. The only way I could fathom to do this would be to expose both directories via NFS and mount that share into the pods. I decided this would be too much of a hassle for no real gain, at least not in the short term. Instead, it makes more sense to deploy the *arr suite on the same server as Aria2 and Jellyfin, which is essentially what the community expects. The recommended images for deploying the applications in containers are pretty crappy. I didn't really want to mess with trying to get the them to work natively on Fedora, nor deal with installing them from tarballs with Ansible, so I created my own Debian-based container images for them and deployed those via Podman+Quadlet. These images are published to the _Packages_ organization in Gitea, which is not public and requires authentication. We can use the Kubernetes Secret to obtain the authentication token to use to pull the image.	2025-12-03 23:05:21 -06:00
Dustin C. Hatch	7eeacdecd7	pikvm: Add user for Prometheus metrics PiKVM exports metrics in Prometheus format, but requires authentication to scrape them.	2025-12-01 12:17:26 -06:00
Dustin C. Hatch	cce485db54	pikvm: Add role/playbook for PiKVM PiKVM comes with its own custom Arch Linux-based operating systems. We want to be able to manage it with our configuration policy, especially for setting up authentication, etc. It won't really work with the host-provisioner without some pretty significant changes to the base playbooks, but we can control some bits directly.	2025-12-01 10:01:07 -06:00
Dustin C. Hatch	1089927be3	all: Use vars for sk/non-sk SSH keys Splitting up the SSH keys authorized for root login into separate variables for SK versus legacy keys will allow more fine-grained control of which set is used in certain situations. Specifically, the intent is to allow non-Fedora operating systems to use the SK variants if applicable, without having to repeat them explicitly.	2025-12-01 09:56:34 -06:00
Dustin C. Hatch	85fc29d511	remote-blackbox: Increase scrape timeout In order to avoid false positives, especially with Invoice Ninja, I'm increasing the timeout values for scraping the public-facing websites. They can occasionally be quite slow, either because of our Internet connection, or load on the servers.	2025-11-25 21:56:20 -06:00
Dustin C. Hatch	0334b1b77a	Merge branch 'fluent-bit'	2025-11-24 07:49:05 -06:00
Dustin C. Hatch	f1b61a8d0a	v-l: Enable useRemoteIP for syslog Victoria Logs can now record the source address for syslog messages in a `remoteIP` field. This has to be enabled specifically, although I can't think of a reason why someone would _not_ want to record that information.	2025-11-24 07:47:35 -06:00
Dustin C. Hatch	8aa1e986d4	r/gitea: Enable PROXY protocol Using the PROXY protocol allows the publicly-facing reverse proxy to pass through the original source address of the client, without doing TLS termination. Clients on the internal network will not go through the proxy, though, so we have to disable the PROXY protocol for those addresses. Unfortunately, the syntax for this is kind of cumbersome, because Apache only has a deny list, not an allow list, so we have to enumerate all of the possible internal addresses _except_ the proxy.	2025-11-19 07:43:29 -06:00
Dustin C. Hatch	2aca0429eb	useproxy: Add ntfy.p.b to NO_PROXY Specifically for _fluent-bit_, which does not correctly handle wildcards or subdomains in `NO_PROXY`, to send real-time notifications from logs via ntfy.	2025-11-16 16:49:15 -06:00
Dustin C. Hatch	7929176b4e	create-dc: Update to use new provisioning process Instead of running `virt-install` directly from the `create-dc.sh` script, it now relies on `newvm.sh`. This will ensure that VMs created to be domain controllers will conform to the same expectations as all other machines, such as using the libvirt domain metadata to build dynamic inventory. Similarly, the `create-dc.yml` playbook now imports the `host-setup.yml` playbook, which covers the basic setup of a new machine. Again, this ensures that the same policy is applied to DCs as to other machines. Finally, domain controller machines now no longer use _winbind_ for OS user accounts and authentication. This never worked particularly well on DCs anyway (particularly because of the way _winbind_ insists on using domain-prefixed user accounts when it runs on a DC), and is now worse with recent Fedora changes. Instead, DCs now have local users who authenticate via SSH certificates, the same as other current-generaton servers.	2025-10-27 12:53:27 -05:00
Dustin C. Hatch	dc8961de92	fluent-bit: Do not apply to K8s nodes We'll manage Fluent-Bit on Kubernetes nodes as a DaemonSet. This will be necessary in order to grant it access to the Kubernetes API so it can augment log records with Kubernetes metadata (labels, pod name, etc.).	2025-10-17 07:51:32 -05:00
Dustin C. Hatch	4601b4d092	victoria-logs: Update to v1.33.1	2025-09-15 11:13:01 -05:00
Dustin C. Hatch	2cba5eb2e4	fluent-bit: Make ntfy pipeline steps optional Most hosts will not need to send any messages to ntfy. Let's define the ntfy pipeline stages only for the machines that need them. There are currently two use cases for ntfy: * MD RAID status messages (from Chromie and nvr2) * WAN Link status messages (from gw1) Breaking up the pipeline into smaller pieces allows both of these use cases to define their appropriate filters while still sharing the common steps. The other machines that have no use for these steps now omit them entirely.	2025-09-15 10:46:45 -05:00
Dustin C. Hatch	faf4822918	fluent-bit: Ignore all HTTP output status messages If the Fluent Bit pipeline includes multiple HTTP outputs, we need to supporess the `HTTP status=200` messages from _all_ of them.	2025-09-15 08:01:42 -05:00
Dustin C. Hatch	3d4bf3dd6c	fluent-bit: Add hostname field to all records Messages from sources other than the systemd journal do not have a `hostname` field by default. This could make filtering logs difficult if there are multiple servers that host the same application. Thus, we need to inject the host name statically into every record, to ensure they can be correctly traced to their source machine.	2025-09-15 08:00:16 -05:00
Dustin C. Hatch	414cb828e1	unifi: Configure Fluent Bit for Unifi server The Unifi Network server writes a bunch of log files that we need to forward to Victoria Logs. This commit introduces components to the Fluent Bit pipeline to read these files with the `tail` input plugin, parse them using regular expressions to extract the correct time stamp from the messages, and send them to Victoria Logs.	2025-09-15 07:58:29 -05:00
Dustin C. Hatch	75061c4d78	all: Split up Fluent Bit vars Instead of defining the common values for Fluent bit inputs, filters, and outputs directly in the variables used by the _fluent-bit_ role, we need to split these into reusable pieces. This way, hosts and groups that need to use a slightly different pipeline configuration can access the default values without having to redefine them.	2025-09-15 07:55:43 -05:00
Dustin C. Hatch	fb93598586	dch-proxy: Use PROXY protocol v1 for Nextcloud Apache doesn't fully support the PROXY v2 protocol. When it's enabled, it spams its error log with messages about unsupported features, e.g.: > [remoteip:error] [pid 1257:tid 1302] [client 172.30.0.6:45614] > AH03507: RemoteIPProxyProtocol: unsupported command 20	2025-08-23 22:52:08 -05:00
Dustin C. Hatch	57a5f83262	nextcloud: Run an SMTP relay locally For some reason, Nextcloud seems to have trouble sending mail via the network-wide relay. It opens a connection, then just sits there and never sends anything until it times out. This happens probably 4 out of 5 times it attempts to send e-mail messages. Running Postfix locally and directing Nextcloud to send mail through it and then on to the network-wide relay seems to work much more reliably.	2025-08-23 22:43:45 -05:00
Dustin C. Hatch	6cd576dd2b	dch-proxy: Proxy for Authelia Authelia is now exposed to the public Internet, under the name _auth.pyrocufflink.net_, which allows it to protect public websites as well.	2025-08-23 22:29:28 -05:00
Dustin C. Hatch	70909d1b13	websites: Enable PROXY protocol for HTTPS sites Since the reverse proxy does TLS pass-through instead of termination, the original source address is lost. Since the source address is important for logging, rate limiting, and access control, we need to use the HAProxy PROXY protocol to pass it along to the web server. Since the PROXY protocol works at the TCP layer, _all_ connections must use it. Fortunately, all of the sites hosted by the public web server are in fact public and only accessed through HAProxy. Similarly, enabling it for one named virtual host enables it for all virtual hosts on that port. Thus, we only have to explicitly set it for one site, and all the rest will use it as well.	2025-08-23 22:21:54 -05:00
Dustin C. Hatch	f8d58ef0ed	websites/dcow: Transition to static site We don't really use this site for screenshot sharing any more. It's cool to keep to look at old screenshots, so I've saved a static snapshot of it that can be hosted by plain ol' Apache.	2025-08-16 08:55:28 -05:00
Dustin C. Hatch	713fd794a3	remote-blackbox: Scrape HTTPS for some sites Now that the Blackbox exporter does not follow redirects, we need to explicitly tell it to scrape the HTTPS variant of sites that have it enabled. Otherwise, we only get info about the first HTTP-to-HTTPS redirect response, which is not helpful for watching certificate expiry.	2025-08-08 11:09:28 -05:00
Dustin C. Hatch	423f28ea53	remote-blackbox: Do not follow HTTP redirects There are a couple of websites we scrape that simply redirect to another name (e.g. _pyrocufflink.net_ → _dustin.hatch.name_, _tabitha.biz_ → _hatchlearningcenter.org_). For these, we want to track the availability of the first step, not the last, especially with regard to their certificate lifetimes.	2025-08-07 11:55:31 -05:00
Dustin C. Hatch	0e15c6a635	needproxy: Add logs.p.b to NO_PROXY `fluent-bit` has a bug ([#3619], [#3907], [#6759]) in its handling of the `NO_PROXY` environment variable. Instead of matching a domain and all its subdomain, like it claims to do in its [documentation][0], it only does an exact string match on the full host name. To work around this, we need to explicitly list `logs.pyrocufflink.blue` in the `no_proxy` value; this will not have any impact on other consumers of this variable, but will make `fluent-bit` work as expected, connecting directly to Victoria Logs instead of through the proxy. [0]: https://docs.fluentbit.io/manual/administration/http-proxy#no_proxy [#3619]: https://github.com/fluent/fluent-bit/issues/3619 [#3907]: https://github.com/fluent/fluent-bit/issues/3907 [#6759]: https://github.com/fluent/fluent-bit/issues/6759	2025-08-06 10:46:03 -05:00
Dustin C. Hatch	dcef009353	fluent-bit: send md alerts to ntfy For machines that have Linux MD RAID arrays, I want to receive notifications about the status of the arrays immediately via _ntfy_. I had this before with `journal2ntfy`, but I never got around to setting it up for the current generation of machines (_nvr2_, _chromie_). Now that we have `fluent-bit` deployed, we can use its pipeline capabilities to select the subset of messages for which we want immediate alerts and send them directly to _ntfy_. We use a Lua function to transform the log record into a body compatible with _ntfy_'s JSON publish request; `fluent-bit` doesn't have any other way to set array values, as needed for the `tags` member.	2025-08-05 10:28:20 -05:00
Dustin C. Hatch	0fe296f7f3	fluent-bit: Deploy log collector for Victoria Logs [fluent-bit][0] is a generic, highly-configurable log collector. It was apparently initially developed for fluentd, but is has so many output capabilities that it works wil many different log aggregation systems, including Victoria Logs. Although Victoria Logs supports the Loki input format, and therefore _Promtail_ would work, I want to try to avoid depending on third-party repositories. _fluent-bit_ is packaged by Fedora, so there shouldn't be any dependency issues, etc. [0]: https://fluentbit.io	2025-08-05 07:14:08 -05:00
Dustin C. Hatch	9e7b9420f4	k8s-iot-net-ctrl: Add node role taints Previously, _node-474c83.k8s.pyrocufflink.black_ was tainted `du5t1n.me/machine=raspberrypi`, which prevented arbitrary pods from being scheduled on it. Now that there are two more Raspberry Pi nodes in the cluster, and arbitrary pods _should_ be scheduled on them, this taint no longer makes sense. Instead, having specific taints for the node's roles is more clear.	2025-07-29 21:44:29 -05:00
Dustin C. Hatch	2b12ce769c	remote-blackbox: Scrape Invoice Ninja	2025-07-28 18:28:30 -05:00
Dustin C. Hatch	0ef65e4e5d	vm-hosts: Update vm_autostart list I never remember to update this list when I add/remove VMs. * _bw0_ has been decommissioned; Vaultwarden now runs in Kubernetes * _unifi3_ has been replaced by _unifi-nuptials_ * _logs-dusk_ runs Victoria Logs, which will evenutally replace Loki * _node-refrain_ has been replaced by _node-direction_ * _k8s-ctrl0_ has been replaced by _ctrl-crave_ and _ctrl-sycamore_	2025-07-28 18:12:09 -05:00
Dustin C. Hatch	e1c157ce87	raspberry-pi: Add collectd sensors, thermal plugins All the Raspberry Pi machines should have the _sensors_ and _thermal_ plugins enabled so we can monitor their CPU etc. temperatures.	2025-07-28 17:50:39 -05:00
Dustin C. Hatch	b2d35ac881	victoria-logs: Listen for Linux netconsole logs The Linux [netconsole][0] protocol is a very simple plain-text UDP stream, with no real metadata to speak of. Although it's not really syslog, Victoria Logs is able to ingest the raw data into the `_msg` field, and uses the time of arrival as the `_time` field. _netconsole_ is somewhat useful for debugging machines that do not have any other console (no monitor, no serial port), like the Raspberry Pi CM4 modules in the DeskPi Super 6c cluster. Unfortunately, its implementation in the kernel is so simple, even the source address isn't particularly useful as an identifier, and since Victoria Logs doesn't track that anyway, we might as well just dump all the messages into a single stream. It's not really discussed in the Victora Logs documentation, but any time multiple syslog listeners with different properties, _all_ of the listeners _must_ specify _all_ of those properties. The defaults will _not_ be used for any stream; the value provided for one stream will be used for all the others unless they specify one themselves. Thus, in order to use the default stream fields for the "regular" syslog listener, we have to explicitly set them. [0]: https://www.kernel.org/doc/html/latest/networking/netconsole.html	2025-07-27 17:47:31 -05:00
Dustin C. Hatch	c67e5f4e0c	cm4-k8s-node: Add group The Raspberry Pi CM4 nodes on the DeskPi Super 6c cluster board are members of the _cm4-k8s-node_ group. This group is a child of _k8s-node_ which overrides the data volume configuration and node labels.	2025-07-27 17:45:46 -05:00
Dustin C. Hatch	0eb6220672	r/mod_md: Configure Apache for ACME certificates Apache supports fetching server certificates via ACME (e.g. from Let's Encrypt) using a new module called _mod_md_. Configuring the module is fairly straightforward, mostly consisting of `MDomain` directives that indicate what certificates to request. Unfortunately, there is one rather annoying quirk: the certificates it obtains are not immediately available to use, and the server must be reloaded in order to start using them. Fortunately, the module provides a notification mechanism via the `MDNotifyCmd` directive, which will run the specified command after obtaining a certificate. The command is executed with the privileges of the web server, which does not have permission to reload itself, so we have to build in some indirection in order to trigger the reload: the notification runs a script that creates an empty file in the server's state directory; systemd is watching for that file to be created, then starts another service unit to trigger the actual reload, then removes trigger file. Website roles, etc. that want to switch to using _mod_md_ to manage their certificates should depend on this role and add an `MDomain` directive to their Apache configuration file fragments.	2025-07-23 10:07:16 -05:00
Dustin C. Hatch	c7374c8cca	r/k8s-controller: Deploy HAProxy The _haproxy_ role only installs HAProxy and provides some basic global configuration; it expects another role to depend on it and provide concrete proxy configuration with drop-in configuration files. Thus, we need a role specifically for the Kubernetes control plane nodes to provide the configuration to proxy for the API server.	2025-07-22 16:21:49 -05:00
Dustin C. Hatch	381ffe7112	kubernetes: Configure keepalived on control plane Control plane nodes will now run _keepalived_, to provide a "floating" IP address that is assigned to one of the nodes at a time. This address (172.30.0.169) is now the target of the DNS A record for _kubernetes.pyrocufflink.blue_, so clients will always communicate with the server that currently holds the floating address, whichever that may be. I was originally inspired by the official Kubernetes [High Availability Considerations][0] document when designing this. At first, I planned to deploy _keepalived_ and HAProxy as DaemonSets on the control plane nodes, but this ended up being somewhat problematic whenever all of the control plane nodes would go down at once, as the _keepalived_ and HAProxy pods would not get scheduled and thus no clients communicate with the API servers. [0]: `9d7cfab6fe/docs/ha-considerations.md`	2025-07-22 16:21:49 -05:00
Dustin C. Hatch	0e6cc4882d	Add k8s-test group This group is used for temporary machines while testing Kubernetes node deployment changes.	2025-07-22 16:21:49 -05:00
Dustin C. Hatch	f7546791cc	kubelet: Fix CA cert for Docker Hub proxy The man page for _containers-certs.d(5)_ says that subdirectories of `/etc/containers/certs.d` should be named `host:port`, however, this is a bit misleading. It seems instead, the directory name must match the name of the registry server as specified, so in the case of a server that supports HTTPS on port 443, where the port would be omitted from the image name, it must also be omitted from the `certs.d` subdirectory name.	2025-07-16 16:05:19 -05:00
Dustin C. Hatch	b9a046c7f4	plugins: Add lookup cache plugin One major weakness with Ansible's "lookup" plugins is that they are evaluated _every single time they are used_, even indirectly. This means, for example, a shell command could be run many times, potentially resulting in different values, or executing a complex calculation that always provides the same result. Ansible does not have a built-in way to cache the result of a `lookup` or `query` call, so I created this one. It's inspired by [ansible-cached-lookup][0], which didn't actually work and is apparently unmaintained. Instead of using a hard-coded file-based caching system, however, my plugin uses Ansible's configuration and plugin infrastructure to store values with any available cache plugin. Although looking up the _pyrocufflink.net_ wildcard certificate with the Kubernetes API isn't particularly expensive by itself right now, I can envision several other uses that may be. Having this plugin available could speed up future playbooks. [0]: https://pypi.org/project/ansible-cached-lookup	2025-07-13 16:02:57 -05:00
Dustin C. Hatch	906819dd1c	r/apache: Use variables for HTTPS cert/key content Using files for certificates and private keys is less than ideal. The only way to "share" a certificate between multiple hosts is with symbolic links, which means the configuration policy has to be prepared for each managed system. As we're moving toward a much more dynamic environment, this becomes problematic; the host-provisioner will never be able to copy a certificate to a new host that was just created. Further, I have never really liked the idea of storing certificates and private keys in Git anyway, even if it is in a submodule with limited access.	2025-07-13 16:02:57 -05:00
Dustin C. Hatch	6667066826	kubelet: Configure cri-o container registries The _containers-image_ role configures _containers-registries.conf(5)_ and _containers-cert.d(5)_, which are used by CRI-O (and `podman`). Specifically, we'll use these to redirect requests for images on Docker Hub (docker.io) to the internal caching proxy.	2025-07-12 16:45:47 -05:00
Dustin C. Hatch	f8f3dd5f83	docker-proxy: Deploy a proxy/cache for Docker Hub Docker Hub's rate limits are so low now that they've started to affect my home lab. Deploying a caching proxy and directing all pull requests through it should prevent exceeding the limit. It will also help prevent containers from starting if access to the Internet is down, as long as their images have been cached recently.	2025-07-12 16:45:47 -05:00
Dustin C. Hatch	6447ff5f4b	v-l: Add data volume for logs storage	2025-07-12 16:08:40 -05:00
Dustin C. Hatch	87d90a617d	minio-backups: Disable nginx access logs entirely The _nginx_ access log files are absolutely spammed with requets from Restic and WAL-G, to the point where they fill the log volume on _chromie_ every day. They're not particularly useful anyway; I've never looked at them, and any information they contain can be obtained in another way, if necessary, for troubleshooting.	2025-07-03 11:15:40 -05:00
Dustin C. Hatch	d4d3f0ef81	r/victoria-logs: Deploy VictoriaLogs I've become rather frusted witih Grafana Loki lately. It has several bugs that affect my usage, including issues with counting and aggregation, completely broken retention and cleanup, spamming itself with bogus error log messages, and more. Now that VitoriaLogs has first-class support in Grafana and support for alerts, it seems like a good time to try it out. It's under very active development, with bugs getting fixed extremely quickly, and new features added constantly. Indeed, as I was experimenting with it, I thought, "it would be nice if the web UI could decode ANSI escapes for terminal colors," and just a few days later, that feature was added! Native support for syslog is also a huge benefit, as it will allow me to collect logs directly from network devices, without first collecting them into a file on the Unifi controller. This new role deploys VictoriaLogs in a manner very similar to how I have Loki set up, as a systemd-managed Podman container. As it has no built-in authentication or authorization, we rely on Caddy to handle that. As with Loki, mTLS is used to prevent anonymous access to querying the logs, however, authentication via Authelia is also an option for human+browser usage. I'm re-using the same certificate authority as with Loki to simplify Grafana configuration. Eventually, I would like to have a more robust PKI, probably using OpenBao, at which point I will (hopefully) have decided which log database I will be using, and can use a proper CA for it.	2025-05-30 21:19:05 -05:00
Dustin C. Hatch	1768678213	frigate: Set logout URL Although I'm sure it will never be used, we might as well set the logout URL to the correct value. When the link is clicked, the browser will navigate to the Authelia logout page, which will invalidate all SSO sessions.	2025-04-21 08:28:49 -05:00
Dustin C. Hatch	113ffa2b96	r/frigate: Update to v0.15 Frigate has evolved a lot over the past year or so since v0.13. Notably, some of the configuration options have been renamed, and _events_ have become _alerts_ and _detections_. There's also now support for authenication, though we don't need it because we're using Authelia.	2025-04-20 16:23:04 -05:00
Dustin C. Hatch	1b94530b1f	frigate: Add front yard camera We're trying to sell the Hustler lawn mower, so we plan to set it out at the end of the driveway for passers-by to see. I've temporarily installed one of the Annke cameras in the kitchen, pointed out the front window, to monitor it.	2025-04-20 14:10:27 -05:00
Dustin C. Hatch	6df0cc39da	unifi: Back up with Restic The Unifi Network data will now be backed up by Restic.	2025-03-29 09:36:37 -05:00

1 2 3 4 5 ...

346 Commits