Commit Graph

98 Commits (master)

Author SHA1 Message Date
Dustin 7fc3465d56 smtp1: Fix mynetworks setting for k8s network
The "Kubernetes" subnet is a /27, not a /28.  There are hosts in that
upper section that was masked out, and these were unable to send e-mails
via the relay because they were excluded from the `mynetworks` value.
2025-08-20 07:11:27 -05:00
Dustin 2d51e2001d gw1: Allow internal IPv6 clients
Specifically to allow the Synology to synchronize its clock, as it only
has an IPv6 address.

We also need to explicitly override `chrony_servers` to an empty list
for the firewall itself, since it syncs with the NTP pool, rather than
its next hop router.
2025-08-17 20:52:36 -05:00
Dustin 6359a140ac gw1/squid: Allow proxy access from kube network
Since we use the proxy when PXE booting to speed up Live OS image and
RPM package downloads, we need to allow machines using it to access the
kickstart files which are now hosted on the PXE server.  Virtual
machines on the Kubernetes network (_pyrocufflink.black_ also need
access to those kickstarts, so we need to mark that subnet as trusted.
2025-07-12 16:45:47 -05:00
Dustin fefa85c83b gw1: squid: Allow access to PXE/kickstarts
The PXE server now hosts the kickstart scripts.
2025-07-12 16:12:23 -05:00
Dustin 0c070c9807 gw1/squid: Allow Unifi controller to internal repos
I've move the Unifi controller back to running on a Fedora Linux
machine.  It therefore needs access to Fedora RPM repositories, as well
as the internal "dch" RPM repository, for system packages.

I also created a new custom container image for the Unifi Network
software (the linuxserver.io one sucks), so the server needs access to
the OCI repo on Gitea.
2025-03-29 08:01:50 -05:00
Dustin c300dc1b6c chrony: Add role/PB for chrony
I continually struggle with machines' (physical and virtual, even the
Roku devices!) clocks getting out of sync.  I have been putting off
fixing this because I wanted to set up a Windows-compatible NTP server
(i.e. on the domain controllers, with Kerberos signing), but there's
really no reason to wait for that to fix the clocks on all the
non-Windows machines, especially since there are exactly 0 Windows
machines on the network right now.

The *chrony* role and corresponding `chrony.yml` playbook are generic,
configured via the `chrony_pools`, `chrony_servers`, and `chrony_allow`
variables.  The values for these variables will configure the firewall
to act as an NTP server, synchronizing with the NTP pool on the
Internet, while all other machines will synchronize with it.  This
allows machines on networks without Internet access to keep their clocks
in sync.
2025-03-16 16:37:19 -05:00
Dustin 81663a654d gw1/squid: Allow to Gitea kicstarts+from p.r
Since the canonical location for Anaconda kickstart scripts is now
Gitea, we need to allow hosts to access them from there.

Also allowing access from the _pyrocufflink.red_ network for e.g.
installation testing.
2024-12-27 13:07:11 -06:00
Dustin 2d5f9e66c1 chromie: Scrape logs from serial consoles
Now that we have the serial terminal server managing `picocom` processes
for each serial port, and those `picocom` processes are configured to
log console output to files, we can configure Promtail to scrape these
log files and send them to Loki.
2024-11-10 18:34:49 -06:00
Dustin a82700a257 chromie: Configure serial terminal server 2024-11-10 13:15:08 -06:00
Dustin eaf9cbef9a Merge remote-tracking branch 'origin/frigate-exporter' 2024-11-05 07:01:31 -06:00
Dustin a9923dcb57 hosts: chromie: Enable collectd md, thermal plugins
To monitor the RAID array and various temperature probes.
2024-11-04 17:52:46 -06:00
Dustin 29d65dd0d5 gw1: squid: Allow access to Gitea
Specifically to allow _nvr2.pyrocufflink.blue_ to fetch the
_frigate-exporter_ container image.
2024-10-21 20:27:31 -05:00
Dustin 621f82c88d hosts: Migrate remaining hosts to Restic
Gitea and Vaultwarden both have SQLite databases.  We'll need to add
some logic to ensure these are in a consistent state before beginning
the backup.  Fortunately, neither of them are very busy databases, so
the likelihood of an issue is pretty low.  It's definitely more
important to get backups going again sooner, and we can deal with that
later.
2024-09-07 20:45:24 -05:00
Dustin d3a09a2e88 hosts: Add chromie, nvr2 to nut-monitor group
Deploy `nut-monitor` on these physical machines so they will shut down
safely in the event of a power outage.
2024-09-01 18:52:33 -05:00
Dustin 14a7d39e11 gw1/squid: Allow Frigate access to Github API
Frigate uses the Github API to check for new releases.  It then
populates the `update.frigate_server` entity in Home Assistant via MQTT
with the information it retrieved.  If it is unable to access the Github
API, the Home Assistant entity will be marked as "unavailable," which
triggers an alert notification from Home Assistant. Thus, we need to
allow Frigate to access Github if we want to use that entity as an
indicator of whether or not Frigate is connected to the MQTT broker.

I don't want to allow access to the Github API to everything on the
Frigate server, just Frigate itself.  To do that, I've assigned a unique
username and password for Frigate.  Only requests with the proper
`Proxy-Authorization` header will be allowed access.  By providing the
credentials only the Frigate container, we can ensure no other process
has access.

I think I did this mostly as an exercise; there's no particular reason
to disallow access to the Github API, since it's mostly read-only and
can't really be used to exfiltrate any data (probably?).
2024-08-14 20:26:11 -05:00
Dustin d2b3b1f7b3 hosts: Deploy production Frigate on nvr2.p.b
*nvr2.pyrocufflink.blue* originally ran Fedora CoreOS.  Since I'm tired
of the tedium and difficulty involved in making configuration changes to
FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.
2024-08-12 22:22:50 -05:00
Dustin 3250628cd1 gw1/squid: Allow NVR servers access to repos
The Frigate NVR servers, prod & test, need to be able to access Fedora
COPR (for the *gasket-dkms* package) and Github Container Registry (for
Frigate itself).
2024-08-12 18:47:04 -05:00
Dustin 3214d4b9b2 gw1/squid: Allow UniFi controller to OCI registries
The UniFi Network server needs to be able access the
_linuxserver.io_/GitHub and Docker Hub OCI image registries for the
Unifi Network and Caddy container images, respectively.
2024-07-31 18:41:13 -05:00
Dustin 805a900f8a gw1/squid: Allow Invoice Ninja to Stripe API
HLC uses Invoice Ninja Stripe integration to process credit card
payments from parents.
2024-07-14 15:45:36 -05:00
Dustin 6caf28259e hosts: db0: Promote to primary
All data have been migrated from the PostgreSQL server in Kubernetes and
the three applications that used it (Firefly-III, Authelia, and Home
Assistant) have been updated to point to the new server.

To avoid comingling the backups from the old server with those from the
new server, we're reconfiguring WAL-G to push and pull from a new S3
prefix.
2024-07-02 20:44:29 -05:00
Dustin b83c6de28a gw1/squid: Add more URLs for Fedora/CoreOS updates
After adding these, *unifi2.pyrocufflink.blue* (FCOS) was finally able
to update successfully.
2024-07-02 20:44:29 -05:00
Dustin 2ce211b5ea hosts: Add db0.p.b
*db0.pyrocufflink.blue* will be the primary server in the new PostgreSQL
database cluster.  We're starting with Fedora 39 so we can have
PostgreSQL 15, to match the version managed by the Postgres Operator in
the Kubernetes cluster right now.
2024-07-02 20:44:29 -05:00
Dustin 93eeaaaed4 gw1: Allow access to DCH yum repo via proxy
Allows installing _sshca-cli-systemd_ from Kickstart.
2024-06-26 18:39:25 -05:00
Dustin 4bdd00d339 gw1: Do not reboot after dnf automatic updates
We don't want the firewall rebooting itself after kernel updates.
Instead, I will reboot it manually at the next appropriate time.
2024-06-13 08:10:55 -05:00
Dustin 8400024249 cloud0: Exclude Nextcloud trash from backups
Files in the Nextcloud trash bin do not need to be backed up.  They are
often large (i.e. my Signal backups), and presumably, they are not
needed anyway; why would they be in the trash otherwise?
2024-06-12 19:04:46 -05:00
Dustin 1babedaf55 gw1: squid: Cache RPMs and installer images
Installing Fedora on a bunch of machines, simultaneously or in rapid
succession, can be painfully slow, as several large files need to be
downloaded.  To speed this up, we download those files via the proxy and
cache them on the proxy server.

As a side-effect, the proxy needs to allow access to the Kickstart
"server" (i.e. my workstation, at least for now), since Anaconda will
use the configured proxy for everything it downloads.
2024-06-12 18:54:29 -05:00
Dustin 9365fd2dd5 gw1: squid: Allow access to FCOS update servers
*unifi2.pyrocufflink.blue*, which is connected to the management
network, can only access the Internet via the proxy.  In order for
Zincati/`rpm-ostree` to automatically update the machine, the proxy
needs to allow access to the FCOS update servers.
2024-06-12 18:52:54 -05:00
Dustin 58972cf188 auto-updates: Install and configure dnf-automatic
*dnf-automatic* is an add-on for `dnf` that performs scheduled,
automatic updates.  It works pretty much how I would want it to:
triggered by a systemd timer, sends email reports upon completion, and
only reboots for kernel et al. updates.

In its default configuration, `dnf-automatic.timer` fires every day.  I
want machines to update weekly, but I want them to update on different
days (so as to avoid issues if all the machines reboot at once).  Thus,
the _dnf-automatic_ role uses a systemd unit extension to change the
schedule.  The day-of-the-week is chosen pseudo-randomly based on the
host name of the managed system.
2024-06-12 06:25:17 -05:00
Dustin c51589adff gw1: Scrape BIND DNS server logs
The BIND server on the firewall is configured to write query logs and
RPZ rewrite logs to files under `/var/log/named`.  We can scrape these
logs with Promtail and use the messages for analytics on the DNS-based
firewall, etc.
2024-02-28 19:06:23 -06:00
Dustin b96164ce11 gw1: Allow rpm.grafana.com via proxy
In order to install Promtail on machines (e.g. *unifi1*) that do not
have direct access to the Internet.
2024-02-22 20:40:51 -06:00
Dustin 39400f3b2f hosts: Remove vars for zbx0.p.b
This machine is long dead.
2024-02-22 10:23:19 -06:00
Dustin 1bff9b2649 gw1: Enable pam_ssh_agent_auth for sudo
This machine is _not_ a member of the _pyrocufflink.blue_ AD domain, so
it does not inherit the settings from that group.  Also, Jenkins does
not manage it, so only my personal keys are authorized.
2024-01-28 12:16:35 -06:00
Dustin be63424fd8 hosts: Deploy Squid on gw1
Running Squid on the firewall makes sense; it's a sort of layer-7
firewall, after all.  There's not much storage on that machine, though
so we don't really want to cache anything.  In fact, it's only purpose
is to allow very limited web access for certain applications.  All
outbound traffic is blocked, with two exceptions:

* Fedora package repositories (for the UniFi controller server)
* Google Fonts (for Invoice Ninja)
2024-01-27 20:09:34 -06:00
Dustin 7b54bc4400 nut-monitor: Require both UPS to be online
Unfortunately, the automatic transfer switch does not seem to work
correctly.  When the standby source is a UPS running on battery, it does
*not* switch sources if the primary fails.  In other words, when the
power is out and both UPS are running on battery, when the first one
dies, it will NOT switch to the second one.  It has no trouble switching
when the second source is mains power, though, which is very strange.

I have tried messing with all the settings including nominal input
voltage, sensitivity, and frequency tolerence, but none seem to have any
effect.

Since it is more important for the machines to shut down safely than it
is to have an extra 10-15 minutes of runtime during an outage, the best
solution for now is to configure the hosts to shut down as soon as the
first UPS battery gets low.  This is largely a waste of the second UPS,
but at least it will help prevent data loss.
2024-01-25 21:22:04 -06:00
Dustin 764177daf3 vmhost0: Shut down when first UPS goes low battery
The automatic transfer switch does not seem to work reliably when both
UPS sources are running on battery.  This means all systems lose power
after the first UPS battery dies, even though the second UPS is still
online.  To minimize the risk of data loss, at least until I figure out
what's wrong, I want both VM hosts to shut down as soon as the first UPS
signals that its battery is low.
2024-01-22 08:46:32 -06:00
Dustin 423951bac1 {burp1, gw1}: Configure upsmon 2024-01-19 21:55:36 -06:00
Dustin d0b0f2ff38 hosts: gw1: Deploy BURP, collectd
Although *gw1* is not really managed by Ansible, it is much easier to
deploy collectd and BURP with the existing playbooks.
2024-01-19 20:52:48 -06:00
Dustin 525f2b2a04 nut-monitor: Configure upsmon
`upsmon` is the component of [NUT] that monitors (local or remote) UPS
devices and reacts to changes in their state.  Notably, it is
responsible for powering down the system when there is insufficient
power to the system.
2024-01-19 20:50:03 -06:00
Dustin 686817571e smtp-relay: Switch to Fastmail
AWS is going to begin charging extra for routable IPv4 addresses soon.
There's really no point in having a relay in the cloud anymore anyway,
since a) all outbound messages are sent via the local relay and b) no
messages are sent to anyone except me.
2023-10-24 17:27:21 -05:00
Dustin a3ea838cac burp-server: Deploy MinIO
We're going to run MinIO on the BURP server to provide a backup target
for the [Postgres Operator][0]/[WAL-E][1].  Although the Postgres
Operator also supports backups via [WAL-G][2], which supports more
backup targets like SFTP, the operator does not support restoring from
those targets.  As such, the best way to get fully-featured backups for
the Postgres Operator, including environment cloning, etc., is to use
S3.  Since I absolutely do not want to store my backups "in the cloud,"
using MinIO seems a decent alternative.  Running it on the BURP server
allows the backups to be stored and rotated along with regular system
backups.

[0]: https://github.com/zalando/postgres-operator/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
2023-05-09 21:55:25 -05:00
Dustin 9921b2fd5e burp1.p.b: Set collectd SELinux domain permissive
Using the *md* plugin generates AVC denials like this:

	type=AVC msg=audit(1681259123.636:338441): avc:  denied  { read } for  pid=1438759 comm="collectd" name="md1" dev="devtmpfs" ino=646 scontext=system_u:system_r:collectd_t:s0 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file permissive=0
2023-04-11 19:26:25 -05:00
Dustin f16c2fae2f burp1.p.b: Enable md and thermal collectd plugins
The BURP storage volume is now backed by a Linux MD RAID array, so we
want to monitor its state.  Furthermore, since this machine is a
physical device, we should monitor its thermal characteristics as well.
2023-04-11 10:14:18 -05:00
Dustin 45148421b0 smtp1.p.b: Allow SMTP relay from Kubernetes network
Applications running on the Kubernetes cluster need to be able to send
e-mail via the relay.
2023-01-13 19:36:20 -06:00
Dustin 57702bb9c7 hosts: vmhost[01]: Update static DNS server address 2022-12-18 20:19:32 -06:00
Dustin e09e684fd8 hosts: Update mtrcs0 FQDN
I moved the metrics Pi from the red network to the blue network.  I
started to get uncormfortable with the firewall changes that were
required to host a service on the red network.  I think it makes the
most sense to define the red network as egress only.
2022-11-09 18:56:05 -06:00
Dustin 5a9b9a8d98 mtrcs0: Remove Ansible user/become settings
Jenkins still connects as *jenkins* and uses `sudo`, so we can't
hard-code the user to *root*.
2022-08-12 13:22:47 -05:00
Dustin 7ac5493b63 smtp1.p.b: Allow SMTP relay from pyrocufflink.red
AlertManager running on *mtrcs0.pyrocufflink.red* needs to be able to
send e-mail through the SMTP relay.
2022-08-11 21:43:48 -05:00
Dustin 4ddbc9f256 hosts: Add mtrcs0.p.r
*mtrcs0.pyrocufflink.red* is a Raspberry Pi CM4 on a Waveshare
CM4-IO-BASE-B carrier board with a NVMe SSD.  It runs a custom OS built
using Buildroot, and is not a member of the *pyrocufflink.blue* AD
domain.

*mtrcs0.p.r* hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager,
and Grafana.  I've created a unique group and playbook for it,
*metricspi*, to manage all these applications together.
2022-08-11 21:40:19 -05:00
Dustin c9dbaa32b9 collectd: Control SELinux domain permissiveness
It seems with each new release of Fedora, some feature or other of
*collectd* gets broken.  In Feodra 36, the *interfaces* plugin does not
seem to work reliably, and the *md* plugin logs a *lot* of errors.
While these issues are investigated upstream, we either need to manage
our own policy for collectd or mark the `collectd_t` domain permissive.
I chose the latter because I'm lazy and I don't consider collectd to be
that big of a threat to security.
2022-07-24 10:35:32 -05:00
Dustin 797cc2092f hosts: Add nvr1.p.b
*nvr1.pyrocufflink.blue* is the new video recording server.  It is a
1U rack-mounted physical machine based on the [Jetway
JBC150F596-3160-B][0] barebone system.  It replaces
*nvr0.pyrocufflink.blue* in this role.

[0]: https://www.jetwaycomputer.com/JBC150F596.html
2022-07-23 17:52:26 -05:00