configpolicy

dustin

Author	SHA1	Message	Date
Dustin	6f5b400f4a	vm-hosts: Fix test network device name The network device for the test/pyrocufflink.red network is named `br1`. This needs to match in the systemd-networkd configuration or libvirt will not be able to attach virtual machines to the bridge.	2024-01-21 15:55:37 -06:00
Dustin	fb445224a0	vm-hosts: Add k8s-amd64-n3 to autostart list	2024-01-21 15:55:23 -06:00
Dustin	525f2b2a04	nut-monitor: Configure upsmon `upsmon` is the component of [NUT] that monitors (local or remote) UPS devices and reacts to changes in their state. Notably, it is responsible for powering down the system when there is insufficient power to the system.	2024-01-19 20:50:03 -06:00
Dustin	ab30fa13ca	file-servers: Set Apache ServerName Since file0.pyrocufflink.blue now hosts a couple of VirtualHosts, accessing its HTTP server by the files.pyrocufflink.blue alias no longer works, as Apache routes unknown hostnames to the first VirtualHost, rather than the global configuration. To resolve this, we must set `ServerName` to the alias.	2023-12-29 10:46:13 -06:00
Dustin	dfd828af08	r/ssh-host-certs: Manage SSH host certificates The ssh-host-certs role, which is now applied as part of the `base.yml` playbook and therefore applies to all managed nodes, is responsible for installing the sshca-cli package and using it to request signed SSH host certificates. The sshca-cli-systemd sub-package includes systemd units that automate the process of requesting and renewing host certificates. These units need to be enabled and provided the URL of the SSHCA service. Additionally, the SSH daemon needs to be configured to load the host certificates.	2023-11-07 21:27:02 -06:00
Dustin	c6f0ea9720	r/repohost: Configure Yum package repo host So it turns out Gitea's RPM package repository feature is less than stellar. Since each organization/user can only have a single repository, separating packages by OS would be extremely cumbersome. Presumably, the feature was designed for projects that only build a single PRM for each version, but most of my packages need multiple builds, as they tend to link to system libraries. Further, only the repository owner can publish to user-scoped repositories, so e.g. Jenkins cannot publish anything to a repository under my dustin account. This means I would ultimately have to create an Organization for every OS/version I need to support, and make Jenkins a member of it. That sounds tedious and annoying, so I decided against using that feature for internal packages. Instead, I decided to return to the old ways, publishing packages with `rsync` and serving them with Apache. It's fairly straightforward to set this up: just need a directory with the appropriate permissions for users to upload packages, and configure Apache to serve from it. One advantage Gitea's feature had over a plain directory is its automatic management of repository metadata. Publishers only have to upload the RPMs they want to serve, and Gitea handles generating the index, database, etc. files necessary to make the packages available to Yum/dnf. With a plain file host, the publisher would need to use `createrepo` to generate the repository metadata and upload that as well. For repositories with multiple packages, the publisher would need a copy of every RPM file locally in order for them to be included in the repository metadata. This, too, seems like it would be too much trouble to be tenable, so I created a simple automatic metadata manager for the file-based repo host. Using `inotifywatch`, the `repohost-createrepo` script watches for file modifications in the repository base directory. Whenever a file is added or changed, the directory containing it is added to a queue. Every thirty seconds, the queue is processed; for each unique directory in the queue, repository metadata are generated. This implementation combines the flexibility of a plain file host, supporting an effectively unlimited number of repositories with fully-configurable permissions, and the ease of publishing of a simple file upload.	2023-11-07 20:51:10 -06:00
Dustin	6955c4e7ad	hosts: Decommission dc-4k6s8e.p.b Replaced by dc-nrtxms.pyrocufflink.blue	2023-10-28 16:07:56 -05:00
Dustin	420764d795	hosts: Add dc-nrtxms.p.b New Fedora 38 Active Directory Domain Controller	2023-10-28 16:07:39 -05:00
Dustin	a8c184d68c	hosts: Decommission dc-ag62kz.p.b Replaced by dc-qi85ia.pyrocufflink.blue	2023-10-28 16:07:08 -05:00
Dustin	686817571e	smtp-relay: Switch to Fastmail AWS is going to begin charging extra for routable IPv4 addresses soon. There's really no point in having a relay in the cloud anymore anyway, since a) all outbound messages are sent via the local relay and b) no messages are sent to anyone except me.	2023-10-24 17:27:21 -05:00
Dustin	1b9543b88f	metricspi: alerts: Increase Frigate disk threshold We want the Frigate recording volume to be basically full at all times, to ensure we are keeping as much recording as possible.	2023-10-15 09:52:12 -05:00
Dustin	2f554dda72	metricspi: Scrape k8s-aarch64-n1 I've added a new Kubernetes worker node, k8s-aarch64-n1.pyrocufflink.blue. This machine is a Raspberry Pi CM4 mounted on a Waveshare CM4-IO-Base A and clipped onto the DIN rail. It's got 8 GB of RAM and 32 GB of eMMC storage. I intend to use it to build container images locally, instead of bringing up cloud instances.	2023-10-05 14:32:19 -05:00
Dustin	a74113d95f	metricspi: Scrape Zincati metrics from CoreOS hosts Zincati is the automatic update manager on Fedora CoreOS. It exposes Prometheus metrics for host/update statistics, which are useful to track the progress of automatic updates and identify update issues. Zinciti actually exposes its metrics via a Unix socket on the filesystem. Another process, [local_exporter], is required to expose the metrics from this socket via HTTP so Prometheus can scrape them. [local_exporter]: https://github.com/lucab/local_exporter	2023-10-03 10:29:12 -05:00
Dustin	d7f778b01c	metricspi: Scrape metrics from k8s-aarch64-n0 collectd is now running on k8s-aarch64-n0.pyrocufflink.blue, exposing system metrics. As it is not a member of the AD domain, it has to be explicitly listed in the `scrape_collectd_extra_targets` variable.	2023-10-03 10:29:11 -05:00
Dustin	50f4b565f8	hosts: Remove nvr1.p.b as managed system nvr1.pyrocufflink.blue has been migrated to Fedora CoreOS. As such, it is no longer managed by Ansible; its configuration is done via Butane/Ignition. It is no longer a member of the Active Directory domain, but it does still run collectd and export Prometheus metrics.	2023-09-27 20:24:47 -05:00
Dustin	7a9c678ff3	burp-server: Keep more backups New retention policy: * 7 daily backups * 4 weekly backups * 12 ~monthly backups * 5 ~yearly backups	2023-07-17 16:36:37 -05:00
Dustin	06782b03bb	vm-hosts: Update VM autostart list * dc2 is gone for a long time, replaced by two new domain controllers * unifi0 was recently replaced by unifi1	2023-07-07 10:05:22 -05:00
Dustin	71a43ccf07	unifi: Deploy Unifi Network controller Since Ubiquiti only publishes Debian packages for the Unifi Network controller software, running it on Fedora has historically been neigh impossible. Fortunately, a modern solution is available: containers. The linuxserver.io project publishes a container image for the controller software, making it fairly easy to deploy on any host with an OCI runtime. I briefly considered creating my own image, since theirs must be run as root, but I decided the maintenance burden would not be worth it. Using Podman's user namespace functionality, I was able to work around this requirement anyway.	2023-07-07 10:05:01 -05:00
Dustin	61844e8a95	pyrocufflink: Add Luma SSH keys for root Sometimes I need to connect to a machine when there is an AD issue (e.g. domain controllers are down, clocks are out of sync, etc.) but I can't do it from my desktop.	2023-07-05 16:35:57 -05:00
Dustin	0a68d84121	metricspi: Scrape hatchlearningcenter.org To monitor site availability and certificate expiration.	2023-06-21 14:31:33 -05:00
Dustin	4e608e379f	metricspi/alerts: Correct BURP archive alert query When the RAID array is being resynchronized after the archived disk has been reconnected, md changes the disk status from "missing" to "spare." Once the synchronization is complete, it changes from "spare" to "active." We only want to trigger the "disk needs archived" alert once the synchronization process is complete; otherwise, both the "disks need swapped" and "disk needs archived" alerts would be active at the same time, which makes no sense. By adjusting the query for the "disk needs archived" alert to consider disks in both "missing" and "spare" status, we can delay firing that alert until the proper time.	2023-06-20 11:58:35 -05:00
Dustin	bf4d57b5cb	frigate: Configure journal2ntfy for MD RAID The Frigate server has a RAID array that it uses to store video recordings. Since there have been a few occasions where the array has suddenly stopped functioning, probably because of the cheap SATA controller, it will be nice to get an alert as soon as the kernel detects the problem, so as to minimize data loss.	2023-06-08 10:05:36 -05:00
Dustin	87e8ec2ed4	synapse: Back up data using BURP Most of the Synapse server's state is in its SQLite database. It also has a `media_store` directory that needs to be backed up, though. In order to back up the SQLite database while the server is running, the database must be in "WAL mode." By default, Synapse leaves the database in the default "rollback journal mode," which disallows multiple processes from accessing the database, even for read-only operations. To change the journal mode: ```sh sudo systemctl stop synapse sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;' sudo systemctl start synapse ```	2023-05-23 09:52:50 -05:00
Dustin	78296f7198	Merge branch 'journal2ntfy'	2023-05-23 08:31:52 -05:00
Dustin	347cda74fd	metrics: Scrape metrics from Kubernetes API server Kubernetes exports a lot of metrics in Prometheus format. I am not sure what all is there, yet, but apparently several thousand time series were added. To allow anonymous access to the metrics, I added this RoleBinding: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - "" resources: - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get ```	2023-05-22 21:21:08 -05:00
Dustin	c0bb387b18	metricspi: Scrape metrics from MinIO backup storage MinIO exposes metrics in Prometheus exposition format. By default, it requires an authentication token to access the metrics, but I was unable to get this to work. Fortunately, it can be configured to allow anonymous access to the metrics, which is fine, in my opinion.	2023-05-22 21:19:25 -05:00
Dustin	a7319c561d	journal2ntfy: Script to send log messagess via ntfy The `journal2ntfy.py` script follows the systemd journal by spawning `journalctl` as a child process and reading from its standard output stream. Any command-line arguments passed to `journal2ntfy` are passed to `journalctl`, which allows the caller to specify message filters. For any matching journal message, `journal2ntfy` sends a message via the ntfy web service. For the BURP server, we're going to use `journal2ntfy` to generate alerts about the RAID array. When I reconnect the disk that was in the fireproof safe, the kernel will log a message from the md subsystem indicating that the resynchronization process has begun. Then, when the disks are again in sync, it will log another message, which will let me know it is safe to archive the other disk.	2023-05-17 14:51:21 -05:00
Dustin	2c002aa7c5	alerts: Add alert to archive BURP disk This alert will fire once the MD RAID resynchronization process has completed and both disks in the array are online. It will clear when one disk is disconnected and moved to the safe.	2023-05-16 08:33:13 -05:00
Dustin	877dcc3879	alerts: Add alerts for missed client backups When BURP fails to even start a backup, it does not trigger a notification at all. As a result, I may not notice for a few days when backups are not happening. That was the case this week, when clients' backups were failing immediately, because of a file permissions issue on the server. To hopefully avoid missing backups for too long in the future, I've added two new alerts: * The no recent backups alert fires if there have not been any BURP backups recently. This may also fire, for example, if the BURP exporter is not working, or if there is something wrong with the BURP data volume. * The missed client backup alert fires if an active BURP client (i.e. one that has had at least one backup in the past 90 days) has not been backed up in the last 24 hours.	2023-05-14 11:48:36 -05:00
Dustin	a2bcd5ccbb	alerts: Adjust BURP RAID disk swap alert Using a 30-day window for the `tlast_change_over_time` function effectively "caps out" the value at 30 days. Thus, the alert reminding me to swap the BURP backup volume will never fire, since the value will never be greater than the 30-day threshold. Using a wider window resolves that issue (though the query will still produce inaccurate results beyond the window).	2023-05-14 11:38:00 -05:00
Dustin	ad9fb6798e	samba-dc: Omit tls cafile setting The `tls cafile` setting in `smb.conf` is not necessary. It is used for verifying peer certificates for mutual TLS authentication, not to specify the intermediate certificate authority chain like I thought. The setting cannot simply be left out, though. If it is not specified, Samba will attempt to load a file from a built-in default path, which will fail, causing the server to crash. This is avoided by setting the value to the empty string.	2023-05-10 08:28:49 -05:00
Dustin	9722fed1b8	metricspi: Scrape dustinandtabitha.com	2023-05-09 21:30:11 -05:00
Dustin	f6f286ac24	alerts: Correct BURP volume swap alert The `tlast_change_over_time` function needs an interval wide enough to consider the range of time we are intrested in. In this case, we want to see if the BURP volume has been swapped in the last thirty days, so the interval needs to be `30d`.	2023-05-03 11:06:34 -05:00
Dustin	5ed3ee525e	synapse: Update LDAP server URI	2023-05-01 12:36:33 -05:00
Dustin	a4cc9d0c46	metricspi: Scrape tabitha.biz	2023-04-23 20:03:43 -05:00
Dustin	6c68126a3a	grafana: Update LDAP server host name dc0.p.b has been gone for a while now. All the current domain controllers use LDAPS certificates signed by Let's Encrypt and include the pyrocufflink.blue name, so we can now use the apex domain A record to connect to the directory.	2023-04-12 14:07:51 -05:00
Dustin	78f65355fa	gitea: Back up with BURP	2023-04-12 14:07:51 -05:00
Dustin	1da4c17a8c	alerts: Add alerts for HTTPS certificates These alerts will generate notifications when websites' HTTPS certificates are not properly renewed automatically and become in danger of expiring.	2023-04-12 13:55:31 -05:00
Dustin	bf4133652c	metrics: Scrape Jenkins with blackbox exporter This is mostly to monitor the HTTPS certificate expiration.	2023-04-12 13:55:31 -05:00
Dustin	dc2a05dc8f	alerts: Add alert for BURP RAID array swap This alert counts how long its been since the number of "active" disks in the RAID array on the BURP server has changed. The assumption is that the number will typically be `1`, but it will be `2` when the second disk synchronized before the swap occurs.	2023-04-11 22:25:36 -05:00
Dustin	2394bf7436	metricspi: Fix vmalert links 1. Grafana 8 changed the format of the query string parameters for the Explore page. 2. vmalert no longer needs the http.pathPrefix argument when behind a reverse proxy, rather it uses the request path like the other Victoria Metrics components.	2023-04-11 21:46:43 -05:00
Dustin	6c562c9821	alerts: Ignore missing mdraid disk for BURP The way I am handling swapping out the BURP disk now is by using the Linux MD RAID driver to manage a RAID 1 mirror array. The array normally operates with one disk missing, as it is in the fireproof safe. When it is time to swap the disks, I reattach the offline disk, let the array resync, then disconnect and store the other disk. This works considerably better than the previous method, as it does not require BURP or the NFS server to be offline during the synchronization.	2023-04-11 20:08:07 -05:00
Dustin	a59f24a8b5	metricspi: Stop scraping speedtest Running the speed test periodically was just wasting bandwidth. It failed frequently, and generally did not provide useful information.	2023-04-02 11:05:16 -05:00
Dustin	94de5d6067	samba-dc: Decrease Samba log level The default log level (3) produces too much output and quickly fills the `/var/log` volume on the domain controllers.	2023-03-08 11:26:57 -06:00
Dustin	748c432334	vaultwarden: Change Domain URL The rule is "if it is accessible on the Internet, its name ends in .net" Although Vaultwarden can be accessed by either name, the one specified in the Domain URL setting is the only one that works for WebAuthn.	2023-03-03 11:17:07 -06:00
Dustin	632e1dd906	metricspi: Update LDAP configuration All domain controllers now use the Let's Encrypt wildcard certificate for the pyrocufflink.blue domain. Further, dc2.p.b is decommissioned.	2023-01-09 12:23:54 -06:00
Dustin	90f9e5eba5	samba-dc: Manage sudoers Domain controllers only allow users in the Domain Admins AD group to use `sudo` by default. dustin and jenkins need to be able to apply configuration policy to these machines, but they are not members of said group.	2022-12-23 08:47:31 -06:00
Dustin	9408ee31c3	home-assistant: Back up Zigbee/ZWave/Mosquitto Mosquitto, Zigbee2MQTT, and ZWaveJS2MQTT all have persistent state that needs to be backed up in addition to Home Assistant's own data.	2022-12-23 06:56:52 -06:00
Dustin	77191c8b5a	Fedora37: Set collectd SELinux domain permissive collectd is broken by default on Fedora 36 and 36. Several plugins generate AVC denials.	2022-12-19 10:22:00 -06:00
Dustin	637289036a	blackbox: Update pyrocufflink DNS check I changed the naming convention for domain controller machines. They are no longer "numbered," since the plan is to rotate through them quickly. For each release of Fedora, we'll create two new domain controllers, replacing the existing ones. Their names are now randomly generated and contain letters and numbers, so the Blackbox Exporter check for DNS records needs to account for this.	2022-12-19 09:04:37 -06:00

1 2 3 4 5

210 Commits (6f5b400f4a9ee71f45dba9678a3b7ee372de481c)