Commit Graph

1038 Commits (7ff18ab75e3305d043d862dafacae7f347a75df7)

Author SHA1 Message Date
Dustin 0a68d84121 metricspi: Scrape hatchlearningcenter.org
To monitor site availability and certificate expiration.
2023-06-21 14:31:33 -05:00
Dustin 4e608e379f metricspi/alerts: Correct BURP archive alert query
When the RAID array is being resynchronized after the archived disk has
been reconnected, md changes the disk status from "missing" to "spare."
Once the synchronization is complete, it changes from "spare" to
"active."  We only want to trigger the "disk needs archived" alert once
the synchronization process is complete; otherwise, both the "disks need
swapped" and "disk needs archived" alerts would be active at the same
time, which makes no sense.  By adjusting the query for the "disk needs
archived" alert to consider disks in both "missing" and "spare" status,
we can delay firing that alert until the proper time.
2023-06-20 11:58:35 -05:00
Dustin b05edbf7fb r/minio: Configure firewall
The firewall needs to allow inbound connections to the MinIO HTTP API
and web UI ports.
2023-06-08 10:07:32 -05:00
Dustin 4776303db2 k8s-node: Deploy NFS client
Longhorn's new RWX (read-write many) mode requires the NFS client
utilities installed on the host machine.
2023-06-08 10:06:02 -05:00
Dustin 679ea47bf7 r/homeassistant: Protect ~/.ssh
When the Home Assistant container restarts, Podman relabels the entire
`/var/lib/homeassistant` directory as `container_file_t`.  Since the
*homeassistant* user's home directory is `/var/lib/homeassistant`, its
`~/.ssh` directory is thus also relabeled, preventing the SSH daemon
from accessing it.  Since Home Assistant itself does not need access to
this path, we can tell systemd to mount an empty tmpfs filesystem there
in the service unit's mount namespace.  This way, when Podman relabels
the directory, it will change the label of the tmpfs mount point instead
of the actual directory.
2023-06-08 10:05:36 -05:00
Dustin bf4d57b5cb frigate: Configure journal2ntfy for MD RAID
The Frigate server has a RAID array that it uses to store video
recordings.  Since there have been a few occasions where the array has
suddenly stopped functioning, probably because of the cheap SATA
controller, it will be nice to get an alert as soon as the kernel
detects the problem, so as to minimize data loss.
2023-06-08 10:05:36 -05:00
Dustin 87e8ec2ed4 synapse: Back up data using BURP
Most of the Synapse server's state is in its SQLite database.  It also
has a `media_store` directory that needs to be backed up, though.

In order to back up the SQLite database while the server is running, the
database must be in "WAL mode."  By default, Synapse leaves the database
in the default "rollback journal mode," which disallows multiple
processes from accessing the database, even for read-only operations.
To change the journal mode:

```sh
sudo systemctl stop synapse
sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;'
sudo systemctl start synapse
```
2023-05-23 09:52:50 -05:00
Dustin 74243080bb r/burp-client: Support pre/post-restore scripts
BURP can run scripts before and after restore.  This may be useful, for
example, to clean up files in a backup that may be in an inconsistent
state.
2023-05-23 09:52:50 -05:00
Dustin 66d0a9157f burp-client: Switch from cron to systemd timer
systemd timer units are supported on all relevant OS versions now.
There is no longer any reason to use cron.
2023-05-23 09:51:07 -05:00
Dustin cd1f7b354b ci: Add Jenkins pipeline for MinIO 2023-05-23 08:33:09 -05:00
Dustin d26de78b3d r/samba-dc: Rotate KDC log weekly
The Samba KDC log file seems to grow rather quickly sometimes, outpacing
the monthly rotation policy.  Let's rotate it weekly and keep 4
historical versions.
2023-05-23 08:31:58 -05:00
Dustin 78296f7198 Merge branch 'journal2ntfy' 2023-05-23 08:31:52 -05:00
Dustin 347cda74fd metrics: Scrape metrics from Kubernetes API server
Kubernetes exports a *lot* of metrics in Prometheus format.  I am not
sure what all is there, yet, but apparently several thousand time series
were added.

To allow anonymous access to the metrics, I added this RoleBinding:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
```
2023-05-22 21:21:08 -05:00
Dustin c0bb387b18 metricspi: Scrape metrics from MinIO backup storage
MinIO exposes metrics in Prometheus exposition format.  By default, it
requires an authentication token to access the metrics, but I was unable
to get this to work.  Fortunately, it can be configured to allow
anonymous access to the metrics, which is fine, in my opinion.
2023-05-22 21:19:25 -05:00
Dustin a7319c561d journal2ntfy: Script to send log messagess via ntfy
The `journal2ntfy.py` script follows the systemd journal by spawning
`journalctl` as a child process and reading from its standard output
stream.  Any command-line arguments passed to `journal2ntfy` are passed
to `journalctl`, which allows the caller to specify message filters.
For any matching journal message, `journal2ntfy` sends a message via
the *ntfy* web service.

For the BURP server, we're going to use `journal2ntfy` to generate
alerts about the RAID array.  When I reconnect the disk that was in the
fireproof safe, the kernel will log a message from the *md* subsystem
indicating that the resynchronization process has begun.  Then, when
the disks are again in sync, it will log another message, which will
let me know it is safe to archive the other disk.
2023-05-17 14:51:21 -05:00
Dustin 2c002aa7c5 alerts: Add alert to archive BURP disk
This alert will fire once the MD RAID resynchronization process has
completed and both disks in the array are online.  It will clear when
one disk is disconnected and moved to the safe.
2023-05-16 08:33:13 -05:00
Dustin 877dcc3879 alerts: Add alerts for missed client backups
When BURP fails to even *start* a backup, it does not trigger a
notification at all.  As a result, I may not notice for a few days when
backups are not happening.  That was the case this week, when clients'
backups were failing immediately, because of a file permissions issue on
the server.  To hopefully avoid missing backups for too long in the
future, I've added two new alerts:

* The *no recent backups* alert fires if there have not been *any* BURP
  backups recently.  This may also fire, for example, if the BURP
  exporter is not working, or if there is something wrong with the BURP
  data volume.
* The *missed client backup* alert fires if an active BURP client (i.e.
  one that has had at least one backup in the past 90 days) has not been
  backed up in the last 24 hours.
2023-05-14 11:48:36 -05:00
Dustin a2bcd5ccbb alerts: Adjust BURP RAID disk swap alert
Using a 30-day window for the `tlast_change_over_time` function
effectively "caps out" the value at 30 days.  Thus, the alert reminding
me to swap the BURP backup volume will never fire, since the value will
never be greater than the 30-day threshold.  Using a wider window
resolves that issue (though the query will still produce inaccurate
results beyond the window).
2023-05-14 11:38:00 -05:00
Dustin ad9fb6798e samba-dc: Omit tls cafile setting
The `tls cafile` setting in `smb.conf` is not necessary.  It is used for
verifying peer certificates for mutual TLS authentication, not to
specify the intermediate certificate authority chain like I thought.

The setting cannot simply be left out, though.  If it is not specified,
Samba will attempt to load a file from a built-in default path, which
will fail, causing the server to crash.  This is avoided by setting the
value to the empty string.
2023-05-10 08:28:49 -05:00
Dustin 5ebe10fb0b Merge branch 'minio' 2023-05-10 08:05:03 -05:00
Dustin a3ea838cac burp-server: Deploy MinIO
We're going to run MinIO on the BURP server to provide a backup target
for the [Postgres Operator][0]/[WAL-E][1].  Although the Postgres
Operator also supports backups via [WAL-G][2], which supports more
backup targets like SFTP, the operator does not support restoring from
those targets.  As such, the best way to get fully-featured backups for
the Postgres Operator, including environment cloning, etc., is to use
S3.  Since I absolutely do not want to store my backups "in the cloud,"
using MinIO seems a decent alternative.  Running it on the BURP server
allows the backups to be stored and rotated along with regular system
backups.

[0]: https://github.com/zalando/postgres-operator/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
2023-05-09 21:55:25 -05:00
Dustin f54bc44a48 minio: Install and configure MinIO
[MinIO][0] is an S3-compatible object storage server.  It is designed to
provide storage for cloud-native applications for on-premises
deployments.

MinIO has not been packaged for Fedora (yet?).  As such, the best way to
deploy it is usining its official container image.  Here, we are using
`podman-systemd-generator` (Quadlet) to generate a systemd service
unit to manage the container process.
2023-05-09 21:37:46 -05:00
Dustin 9722fed1b8 metricspi: Scrape dustinandtabitha.com 2023-05-09 21:30:11 -05:00
Dustin f6f286ac24 alerts: Correct BURP volume swap alert
The `tlast_change_over_time` function needs an interval wide enough to
consider the range of time we are intrested in.  In this case, we want
to see if the BURP volume has been swapped in the last thirty days, so
the interval needs to be `30d`.
2023-05-03 11:06:34 -05:00
Dustin 5ed3ee525e synapse: Update LDAP server URI 2023-05-01 12:36:33 -05:00
Dustin b68a99091d vault-secret: Get key from Bitwarden
I don't use GnuPG for anything else anymore, so it's becoming rather
cumbersome to keep setting it up just for the Ansible Vault secret.
Since I use Bitwarden to store the passphrase for my PGP key anyway, it
makes sense to just store the Ansible Vault secret there directly.
2023-04-23 20:05:00 -05:00
Dustin ed42f848b9 r/ssh-hostkeys: Add keys for git.p.b
Git clients access Gitea over SSH using the *git.pyrocufflink.blue* and
*git.pyrocufflink.net* names.
2023-04-23 20:03:44 -05:00
Dustin a4cc9d0c46 metricspi: Scrape tabitha.biz 2023-04-23 20:03:43 -05:00
Dustin 2920c25a69 websites/p-bitwarden: Redirect .blue to .net
Avoid confusion with WebAuthn by ensuring users only access the
application by its canonical name.
2023-04-23 18:45:28 -05:00
Dustin 6c68126a3a grafana: Update LDAP server host name
*dc0.p.b* has been gone for a while now.  All the current domain
controllers use LDAPS certificates signed by Let's Encrypt and include
the *pyrocufflink.blue* name, so we can now use the apex domain A record
to connect to the directory.
2023-04-12 14:07:51 -05:00
Dustin 78f65355fa gitea: Back up with BURP 2023-04-12 14:07:51 -05:00
Dustin 1da4c17a8c alerts: Add alerts for HTTPS certificates
These alerts will generate notifications when websites' HTTPS
certificates are not properly renewed automatically and become in danger
of expiring.
2023-04-12 13:55:31 -05:00
Dustin bf4133652c metrics: Scrape Jenkins with blackbox exporter
This is mostly to monitor the HTTPS certificate expiration.
2023-04-12 13:55:31 -05:00
Dustin dc2a05dc8f alerts: Add alert for BURP RAID array swap
This alert counts how long its been since the number of "active" disks
in the RAID array on the BURP server has changed.  The assumption is
that the number will typically be `1`, but it will be `2` when the
second disk synchronized before the swap occurs.
2023-04-11 22:25:36 -05:00
Dustin 2394bf7436 metricspi: Fix vmalert links
1. Grafana 8 changed the format of the query string parameters for the
   Explore page.
2. vmalert no longer needs the http.pathPrefix argument when behind a
   reverse proxy, rather it uses the request path like the other
   Victoria Metrics components.
2023-04-11 21:46:43 -05:00
Dustin 6c562c9821 alerts: Ignore missing mdraid disk for BURP
The way I am handling swapping out the BURP disk now is by using the
Linux MD RAID driver to manage a RAID 1 mirror array.  The array
normally operates with one disk missing, as it is in the fireproof safe.
When it is time to swap the disks, I reattach the offline disk, let the
array resync, then disconnect and store the other disk.

This works considerably better than the previous method, as it does not
require BURP or the NFS server to be offline during the synchronization.
2023-04-11 20:08:07 -05:00
Dustin 9921b2fd5e burp1.p.b: Set collectd SELinux domain permissive
Using the *md* plugin generates AVC denials like this:

	type=AVC msg=audit(1681259123.636:338441): avc:  denied  { read } for  pid=1438759 comm="collectd" name="md1" dev="devtmpfs" ino=646 scontext=system_u:system_r:collectd_t:s0 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file permissive=0
2023-04-11 19:26:25 -05:00
Dustin f16c2fae2f burp1.p.b: Enable md and thermal collectd plugins
The BURP storage volume is now backed by a Linux MD RAID array, so we
want to monitor its state.  Furthermore, since this machine is a
physical device, we should monitor its thermal characteristics as well.
2023-04-11 10:14:18 -05:00
Dustin a59f24a8b5 metricspi: Stop scraping speedtest
Running the speed test periodically was just wasting bandwidth.  It
failed frequently, and generally did not provide useful information.
2023-04-02 11:05:16 -05:00
Dustin 94de5d6067 samba-dc: Decrease Samba log level
The default log level (3) produces too much output and quickly fills the
`/var/log` volume on the domain controllers.
2023-03-08 11:26:57 -06:00
Dustin 748c432334 vaultwarden: Change Domain URL
The rule is "if it is accessible on the Internet, its name ends in .net"

Although Vaultwarden can be accessed by either name, the one specified
in the Domain URL setting is the only one that works for WebAuthn.
2023-03-03 11:17:07 -06:00
Dustin 45148421b0 smtp1.p.b: Allow SMTP relay from Kubernetes network
Applications running on the Kubernetes cluster need to be able to send
e-mail via the relay.
2023-01-13 19:36:20 -06:00
Dustin b1fa4fc8a7 r/web/chmod777.sh: Add HTTP redirect
The HTTP->HTTPS redirect for chmod777.sh was only working by
coincidence.  It needs its own virtual host to ensure it works
irrespective of how other websites are configured.
2023-01-09 13:06:56 -06:00
Dustin 1b7a8885b8 r/web/hlc: Configure formsubmit
Tabitha's Hatch Learning Center site has two user submission forms: one
for signing in/out students for class, and another for parents to
register new students for the program.  These are handled by
*formsubmit* and store data in CSV spreadsheets.
2023-01-09 12:59:58 -06:00
Dustin 632e1dd906 metricspi: Update LDAP configuration
All domain controllers now use the Let's Encrypt wildcard certificate
for the *pyrocufflink.blue* domain.  Further, *dc2.p.b* is
decommissioned.
2023-01-09 12:23:54 -06:00
Dustin 90f9e5eba5 samba-dc: Manage sudoers
Domain controllers only allow users in the *Domain Admins* AD group to
use `sudo` by default.  *dustin* and *jenkins* need to be able to apply
configuration policy to these machines, but they are not members of said
group.
2022-12-23 08:47:31 -06:00
Dustin bc4c7edbad r/base: Clear facts after installing python-selinux
If the Python bindings for SELinux policy management are not installed
when Ansible gathers host facts, no SELinux-related facts will be set.
Thus, any tasks that are conditional based on these facts will not run.
Typically, such tasks are required for SELinux-enabled hosts, but must
not be performed for non-SELinux hosts.  If they are not run when they
should, the deployment may fail or applications may experience issues at
runtime.

To avoid these potential issues, the *base* role now forces Ansible to
gather facts again if it installed the Python SELinux bindings.

Note: one might suggest using `meta: clear_facts` instead of `setup` and
letting Ansible decide if and when to gather facts again. Unfortunately,
this for some reason doesn't work; the `clear_facts` meta task just
causes Ansible to crash with a "shared connection to {host} closed."
2022-12-23 08:44:30 -06:00
Dustin 9408ee31c3 home-assistant: Back up Zigbee/ZWave/Mosquitto
Mosquitto, Zigbee2MQTT, and ZWaveJS2MQTT all have persistent state that
needs to be backed up in addition to Home Assistant's own data.
2022-12-23 06:56:52 -06:00
Dustin 6b4b18dac8 facts: Add PB to force fetching facts
Some playbooks/roles require facts from machines other than the target.
The `facts.yml` playbook can be used to gather facts from machines
without running any other tasks.
2022-12-23 06:56:52 -06:00
Dustin 10344b07c7 hosts: Add ag62kz.p.b
New domain controller (Fedora 37): *ag62kz.pyrocufflink.blue*
2022-12-23 06:56:52 -06:00