Commit Graph

251 Commits (ab5da581758d6cb289c5401718e8e14c613df12c)

Author SHA1 Message Date
Dustin 3511176c31 r/gitea: Configure SMTP mailer
Gitea needs SMTP configuration in order to send e-mail notifications
about e.g. pull requests.  The `gitea_smtp` variable can be defined to
enable this feature.
2024-08-25 08:46:37 -05:00
Dustin 85da487cb8 r/dch-proxy: Define sites declaratively
I've already made a couple of mistakes keeping the HTTP and HTTPS rules
in sync.  Let's define the sites declaratively and derive the HAProxy
rules from the data, rather then manually type the rules.
2024-08-24 11:48:45 -05:00
Dustin 2fa28dfa5f r/dch-proxy: Update and clean up
The *dch-proxy* role has not been used for quite some time.  The web
server has been handling the reerse proxy functionality, in addition to
hosting websites.  The drawback to using Apache as the reverse proxy,
though, is that it operates in TLS-terminating mode, so it needs to have
the correct certificate for every site and application it proxies for.
This is becoming cumbersome, especially now that there are several sites
that do not use the _pyrocufflink.net_ wildcard certificate.  Notably,
Tabitha's _hatchlearningcenter.org_ is problematic because although the
main site are hosted by the web server, the Invoice Ninja client portal
is hosted in Kubernetes.

Switching back to HAProxy to provide the reverse proxy functionality
will eliminate the need to have the server certificate both on the
backend and on the reverse proxy, as it can operate in TLS-passthrough
mode.  The main reason I stopped using HAProxy in the first place was
because when using TLS-passthrough mode, the original source IP address
is lost.  Fortunately, HAProxy and Apache can both be configured to use
the PROXY protocol, which provides a mechanism for communicating the
original IP address while still passing through the TLS connection
unmodified.  This is particularly important for Nextcloud because of its
built-in intrusion prevention; without knowing the actual source IP
address, it blocks _everyone_, since all connections appear to come from
the reverse proxy's IP address.

Combining TLS-passthrough mode with the PROXY protocol resolves both the
certificate management issue and the source IP address issue.

I've cleaned up the _dch-proxy_ role quite a bit in this commit.
Notably, I consolidated all the backend and frontend definitions into a
single file; it didn't really make sense to have them all separate,
since they were managed by the same role and referred to each other.  Of
course, I had to update the backends to match the currently-deployed
applications as well.
2024-08-24 11:46:28 -05:00
Dustin 153b210a73 vm-hosts: Do not reboot after auto updates
For obvious reasons, the VM hosts cannot automatically reboot
themselves.
2024-08-23 09:33:29 -05:00
Dustin c546f09335 smtp-relay: Rewrite dustin@hatch.name
Sometimes, the mail server for *hatch.name* is extremely slow.  While
there isn't much I can do about it for external senders, I can at least
ensure that email messages sent by internal services like Authelia are
always delivered quickly by rewriting the recipient address to my
actualy email address, bypassing the *hatch.name* exchange entirely.
2024-08-22 16:17:00 -05:00
Dustin a2cf78f3f5 vm-hosts: Update vm-autostart
*logs0.pyrocufflink.blue* has been replaced by *loki0.pyrocufflink.blue*
since ages, so I'm not sure how I hadn't updated the autostart list with
it yet.

*unifi3.pyrocufflink.blue* replaced *unifi2.p.b* recently, when I was
testing *Luci*/etcd.
2024-08-14 20:26:11 -05:00
Dustin 6d65e0594f frigate: Configure HTTPS proxy with creds
Only the _frigate_ user is allowed to access the Github API via the
proxy.
2024-08-14 20:26:11 -05:00
Dustin d2b3b1f7b3 hosts: Deploy production Frigate on nvr2.p.b
*nvr2.pyrocufflink.blue* originally ran Fedora CoreOS.  Since I'm tired
of the tedium and difficulty involved in making configuration changes to
FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.
2024-08-12 22:22:50 -05:00
Dustin 6c71d96f81 r/frigate-caddy: Deploy Caddy in front of Frigate
Deploying Caddy as a reverse proxy for Frigate enables HTTPS with a
certificate issued by the internal CA (via ACME) and authentication via
Authelia.

Separating the installation and base configuratieon of Caddy into its
own role will allow us to reuse that part for other sapplications that
use Caddy for similar reasons.
2024-08-12 18:47:04 -05:00
Dustin 7b61a7da7e r/useproxy: Configure system-wide proxy
The *useproxy* role configures the `http_proxy` et al. environmet
variables for systemd services and interactive shells.  Additionally, it
configures Yum repositories to use a single mirror via the `baseurl`
setting, rather than a list of mirrors via `metalink`, since the proxy
a) the proxy only allows access to _dl.fedoraproject.org_ and b) the
proxy caches RPM files, but this is only effective if all clients use
the same mirror all the time.

The `useproxy.yml` playbook applies this role to servers in the
*needproxy* group.
2024-08-12 18:47:04 -05:00
Dustin 96bc8c2c09 vm-hosts: Update autostart list
*k8s-amd64-n0*, *k8s-amd64-n1*, and *k8s-amd64-n2* have been replaced by
*k8s-amd64-n4*, *k8s-amd64-n5*, *k8s-amd64-n6*, respectively.  *db0* is
the new database server, which needs to be up before anything in
Kubernetes starts, since a lot of applications running there depend on
it.
2024-07-03 08:52:15 -05:00
Dustin 4f202c55e4 r/postgres-exporter: Deploy postgres-exporter
The [postgres-exporter][0] exposes PostgreSQL server statistics to
Prometheus.  It connects to a specified PostgreSQL server (in this
case, a server on the local machine via UNIX socket) and collects data
from the `pg_stat_activity`, et al. views.  It needs the `pg_monitor`
role in order to be allowed to read the relevant metrics.

Since we're setting up the exporter to connect via UNIX socket, it needs
a dedicated OS user to match the PostgreSQL user in order to
authenticate via the _peer_ method.

[0]: https://github.com/prometheus-community/postgres_exporter/
2024-07-02 20:44:29 -05:00
Dustin 3f5550ee6c postgresql: wal-g: Set PGHOST
By default, WAL-G tries to connect to the PostgreSQL server via TCP
socket on the loopback interface.  Our HBA configuration requires
certificate authentication for TCP sockets, so we need to configure
WAL-G to use the UNIX socket.
2024-07-02 20:44:29 -05:00
Dustin 6caf28259e hosts: db0: Promote to primary
All data have been migrated from the PostgreSQL server in Kubernetes and
the three applications that used it (Firefly-III, Authelia, and Home
Assistant) have been updated to point to the new server.

To avoid comingling the backups from the old server with those from the
new server, we're reconfiguring WAL-G to push and pull from a new S3
prefix.
2024-07-02 20:44:29 -05:00
Dustin 208fadd2ba postgresql: Configure for dedicated DB servers
I am going to use the *postgresql* group for the dedicated database
servers.  The configuration for those machines will be quite a bit
different than for the one existing machine that is a member of that
group already: the Nextcloud server.  Rather than undefine/override all
the group-level settings at the host level, I have removed the Nextcloud
server from the *postgresql* group, and updated the `nextcloud.yml`
playbook to apply the *postgresql-server* role itself.

Eventually, I want to move the Nextcloud database to the central
database servers.  At that point, I will remove the *postgresql-server*
role from the `nextcloud.yml` playbook.
2024-07-02 20:44:29 -05:00
Dustin 7201f7ed5c vm-hosts: Expose storage VLAN to VMs
To improve the performance of persistent volumes accessed directly from
the Synology by Kubernetes pods, I've decided to expose the storage
network to the Kubernetes worker node VMs.  This way, iSCSI traffic does
not have to go through the firewall.

I chose not to use the physical interfaces that are already directly
connected to the storage network for this for two reasons: 1) I like
the physical separation of concerns and 2) it would add complexity to
the setup by introducing a bridge on top of the existing bond.
2024-06-23 10:43:15 -05:00
Dustin 6520b86958 k8s-controller: Do not reboot after auto-updates
I don't want the Kubernetes control plane servers rebooting themselves
randomly; I need to coordinate that with other goings-on on the network.
2024-06-23 10:43:15 -05:00
Dustin f0445ebe53 nextcloud: Do not auto-update Nextcloud
Nextcloud usually (always?) wants the `occ upgrade` command to be run
after an update.  If the *nextcloud* package gets updated along with
the rest of the OS, Nextcloud will be down until I manually run that
command hours/days later.
2024-06-23 10:43:15 -05:00
Dustin 24bf145a34 all: Do not auto-update on weekends
I don't want machines updating themselves, rebooting, and potentially
breaking stuff over the weekend.
2024-06-21 22:08:03 -05:00
Dustin 88c45e22b6 vm-hosts: Update VM autostart for new DCs 2024-06-20 18:49:04 -05:00
Dustin 292ab4585c all: promtail: Update trusted CA certificate
Loki uses a certificate signed by *dch-ca r2* now (actually has for
quite some time...)
2024-06-12 18:57:01 -05:00
Dustin ffe972d79b r/samba-cert: Obtain LDAP/TLS cert via ACME
The *samba-cert* role configures `lego` and HAProxy to obtain an X.509
certificate via the ACME HTTP-01 challenge.  HAProxy is necessary
because LDAP server certificates need to have the apex domain in their
SAN field, and the ACME server may contact *any* domain controller
server with an A record for that name.  HAProxy will forward the
challenge request on to the first available host on port 5000, where
`lego` is listening to provide validation.

Issuing certificates this way has a couple of advantages:

1. No need for the wildcard certificate for the *pyrocufflink.blue*
   domain any more
2. Renewals are automatic and handled by the server itself rather than
   Ansible via scheduled Jenkins job

Item (2) is particularly interesting because it avoids the bi-monthly
issue where replacing the LDAP server certificate and restarting Samba
causes the Jenkins job to fail.

Naturally, for this to work correctly, all LDAP client applications
need to trust the certificates issued by the ACME server, in this case
*DCH Root CA R2*.
2024-06-12 18:33:24 -05:00
Dustin 58972cf188 auto-updates: Install and configure dnf-automatic
*dnf-automatic* is an add-on for `dnf` that performs scheduled,
automatic updates.  It works pretty much how I would want it to:
triggered by a systemd timer, sends email reports upon completion, and
only reboots for kernel et al. updates.

In its default configuration, `dnf-automatic.timer` fires every day.  I
want machines to update weekly, but I want them to update on different
days (so as to avoid issues if all the machines reboot at once).  Thus,
the _dnf-automatic_ role uses a systemd unit extension to change the
schedule.  The day-of-the-week is chosen pseudo-randomly based on the
host name of the managed system.
2024-06-12 06:25:17 -05:00
Dustin 1f86fa27b6 vm-hosts: Auto-start unifi2 2024-05-26 10:51:16 -05:00
Dustin 5a9b8b178a hosts: Decommission unifi1
*unifi1.pyrocufflink.blue* is being replaced with
*unifi2.pyrocufflink.blue*.  The new server runs Fedora CoreOS.
2024-05-26 10:50:32 -05:00
Dustin 06b399994e public-web: Add Tabitha's new SSH key
We got Nicepage to work on Tabitha's Fedora Thinkpad, so now she'll do
most of her website work on that machine.
2024-03-15 10:29:03 -05:00
Dustin 0578736596 unifi: Scrape logs from UniFi and device syslog
The UniFi controller can act as a syslog server, receiving log messages
from managed devices and writing them to files in the `logs/remote`
directory under the application data directory.  We can scrape these
logs, in addition to the logs created by the UniFi server itself, with
Promtail to get more information about what's happening on the network.
2024-02-28 19:04:30 -06:00
Dustin 19009bde1a promtail: Role/Playbook to deploy Promtail
Promtail is the log sending client for Grafana Loki.  For traditional
Linux systems, an RPM package is available from upstream, making
installation fairly simple.  Configuration is stored in a YAML file, so
again, it's straightforward to configure via Ansible variables.  Really,
the only interesting step is adding the _promtail_ user, which is
created by the RPM package, to the _systemd-journal_ group, so that
Promtail can read the systemd journal files.
2024-02-22 19:23:31 -06:00
Dustin 226a9e05fa nut: Drop group
NUT is managed by _cfg.git_ now.
2024-02-22 10:24:16 -06:00
Dustin 493663e77f frigate: Drop group
Frigate is no longer managed by Ansible.  Dropping the group so the file
encrypted with Ansible Vault can go away.
2024-02-22 10:23:19 -06:00
Dustin fdc59fe73b pyrocufflink-dns: Drop group
The internal DNS server for the *pyrocufflink.blue* et al. domains runs
on the firewall now, and is thus no longer managed by Ansible.  Dropping
the group variables so the file encrypted with Ansible Vault can go
away.
2024-02-22 10:23:19 -06:00
Dustin 19d833cc76 websites/d&t.com: drop obsolete formsubmit config
The *dustinandtabitha.com* website no longer uses *formsubmit* (the time
for RSVP has **long** passed).  Removing the configuration so the
file encrypted with Ansible Vault can go away.
2024-02-22 10:23:19 -06:00
Dustin f9f8d5aa29 Remove grafana, metricspi groups
With the Metrics Pi decommissioned and Victoria Metrics and Grafana
running in Kubernetes now, these groups are no longer needed.
2024-02-22 10:23:19 -06:00
Dustin f83cea50e9 r/ssu-user-ca: Configure sshd TrustedUserCAKeys
The `TrustedUserCAKeys` setting for *sshd(8)* tells the server to accept
any certificates signed by keys listed in the specified file.
The authenticating username has to match one of the principals listed in
the certificate, of course.

This role is applied to all machines, via the `base.yml` playbook.
Certificates issued by the user CA managed by SSHCA will therefore be
trusted everywhere.  This brings us one step closer to eliminating the
dependency on Active Directory/Samba.
2024-02-01 18:46:40 -06:00
Dustin 0d30e54fd5 r/fileserver: Restrict non-administrators to SFTP
Normal users do not need shell access to the file server, and certainly
should not be allowed to e.g. forward ports through it.  Using a `Match`
block, we can apply restrictions to users who do not need administrative
functionality.  In this case, we restrict everyone who is not a member
of the *Server Admins* group in the PYROCUFFLINK AD domain.
2024-02-01 10:29:32 -06:00
Dustin 4b8b5fa90b pyrocufflink: Enable pam_ssh_agent_auth for sudo
By default, `sudo` requires users to authenticate with their passwords
before granting them elevated privileges.  It can be configured to
allow (some) users access to (some) privileged commands without
prompting for a password (i.e. `NOPASSWD`), however this has a real
security implication.  Disabling the password requirement would
effectively grant *any* program root privileges.  Prompting for a
password prevents malicious software from running privileged commands
without the user knowing.

Unfortunately, handling `sudo` authentication for Ansible is quite
cumbersome.  For interactive use, the `--ask-become-pass`/`-K` argument
is useful, though entering the password for each invocation of
`ansible-playbook` while iterating on configuration policy development
is a bit tedious.  For non-interactive use, though, the password of
course needs to be stored somewhere.  Encrypting it with Ansible Vault
is one way to protect it, but it still ends up stored on disk somewhere
and needs to be handled carefully.

*pam_ssh_agent_auth* provides an acceptable solution to both issues.  It
is better than disabling `sudo` authentication entirely, but a lot more
convenient than dealing with passwords.  It uses the calling user's SSH
agent to assert that the user has access to a private key corresponding
to one of the authorized public keys.  Using SSH agent forwarding, that
private key can even exist on a remote machine.  If the user does not
have a corresponding private key, `sudo` will fall back to normal
password-based authentication.

The security of this solution is highly dependent on the client to store
keys appropriately.  FIDO2 keys are supported, though when used with
Ansible, it is quite annoying to have to touch the token for _every
task_ on _every machine_.  Thus, I have created new FIDO2 keys for both
my laptop and my desktop that have the `no-touch-required` option
enabled.  This means that in order to use `sudo` remotely, I still need
to have my token plugged in to my computer, but I do not have to tap it
every time it's used.

For Jenkins, a hardware token is obviously impossible, but using a
dedicated key stored as a Jenkins credential is probably sufficient.
2024-01-28 12:16:35 -06:00
Dustin 7b54bc4400 nut-monitor: Require both UPS to be online
Unfortunately, the automatic transfer switch does not seem to work
correctly.  When the standby source is a UPS running on battery, it does
*not* switch sources if the primary fails.  In other words, when the
power is out and both UPS are running on battery, when the first one
dies, it will NOT switch to the second one.  It has no trouble switching
when the second source is mains power, though, which is very strange.

I have tried messing with all the settings including nominal input
voltage, sensitivity, and frequency tolerence, but none seem to have any
effect.

Since it is more important for the machines to shut down safely than it
is to have an extra 10-15 minutes of runtime during an outage, the best
solution for now is to configure the hosts to shut down as soon as the
first UPS battery gets low.  This is largely a waste of the second UPS,
but at least it will help prevent data loss.
2024-01-25 21:22:04 -06:00
Dustin 236e6dced6 r/web/hlc: Add formsubmit config for summer signup
And of course, Tabitha lost her SSH key so she had to get another one.
2024-01-23 22:04:29 -06:00
Dustin 07f84e7fdc vm-hosts: Increase VM start delay after K8s
Increasing the delay after starting the Kubernetes cluster to hopefully
allow things to "settle down" enough that starting services on follow up
VMs doesn't time out.
2024-01-22 08:35:40 -06:00
Dustin 6f4fb70baa vm-hosts: Clean up vm-autostart list
Start Kubernetes earlier.  Start Synapse later (it takes a long time to
start up and often times out when the VM hosts are under heavy load).
Start SMTP relay later as it's not really needed.
2024-01-21 18:42:28 -06:00
Dustin b4fcbb8095 unifi: Deploy unifi_exporter
`unifi_exporter` provides Prometheus metrics for UniFi controller.
2024-01-21 16:12:29 -06:00
Dustin 6f5b400f4a vm-hosts: Fix test network device name
The network device for the test/*pyrocufflink.red* network is named
`br1`.  This needs to match in the systemd-networkd configuration or
libvirt will not be able to attach virtual machines to the bridge.
2024-01-21 15:55:37 -06:00
Dustin fb445224a0 vm-hosts: Add k8s-amd64-n3 to autostart list 2024-01-21 15:55:23 -06:00
Dustin 525f2b2a04 nut-monitor: Configure upsmon
`upsmon` is the component of [NUT] that monitors (local or remote) UPS
devices and reacts to changes in their state.  Notably, it is
responsible for powering down the system when there is insufficient
power to the system.
2024-01-19 20:50:03 -06:00
Dustin ab30fa13ca file-servers: Set Apache ServerName
Since *file0.pyrocufflink.blue* now hosts a couple of VirtualHosts,
accessing its HTTP server by the *files.pyrocufflink.blue* alias no
longer works, as Apache routes unknown hostnames to the first
VirtualHost, rather than the global configuration.  To resolve this, we
must set `ServerName` to the alias.
2023-12-29 10:46:13 -06:00
Dustin dfd828af08 r/ssh-host-certs: Manage SSH host certificates
The *ssh-host-certs* role, which is now applied as part of the
`base.yml` playbook and therefore applies to all managed nodes, is
responsible for installing the *sshca-cli* package and using it to
request signed SSH host certificates.  The *sshca-cli-systemd*
sub-package includes systemd units that automate the process of
requesting and renewing host certificates.  These units need to be
enabled and provided the URL of the SSHCA service.  Additionally, the
SSH daemon needs to be configured to load the host certificates.
2023-11-07 21:27:02 -06:00
Dustin c6f0ea9720 r/repohost: Configure Yum package repo host
So it turns out Gitea's RPM package repository feature is less than
stellar.  Since each organization/user can only have a single
repository, separating packages by OS would be extremely cumbersome.
Presumably, the feature was designed for projects that only build a
single PRM for each version, but most of my packages need multiple
builds, as they tend to link to system libraries.  Further, only the
repository owner can publish to user-scoped repositories, so e.g.
Jenkins cannot publish anything to a repository under my *dustin*
account.  This means I would ultimately have to create an Organization
for every OS/version I need to support, and make Jenkins a member of it.
That sounds tedious and annoying, so I decided against using that
feature for internal packages.

Instead, I decided to return to the old ways, publishing packages with
`rsync` and serving them with Apache.  It's fairly straightforward to
set this up: just need a directory with the appropriate permissions for
users to upload packages, and configure Apache to serve from it.

One advantage Gitea's feature had over a plain directory is its
automatic management of repository metadata.  Publishers only have to
upload the RPMs they want to serve, and Gitea handles generating the
index, database, etc. files necessary to make the packages available to
Yum/dnf.  With a plain file host, the publisher would need to use
`createrepo` to generate the repository metadata and upload that as
well.  For repositories with multiple packages, the publisher would need
a copy of every RPM file locally in order for them to be included in the
repository metadata.  This, too, seems like it would be too much trouble
to be tenable, so I created a simple automatic metadata manager for the
file-based repo host.  Using `inotifywatch`, the `repohost-createrepo`
script watches for file modifications in the repository base directory.
Whenever a file is added or changed, the directory containing it is
added to a queue.  Every thirty seconds, the queue is processed; for
each unique directory in the queue, repository metadata are generated.

This implementation combines the flexibility of a plain file host,
supporting an effectively unlimited number of repositories with
fully-configurable permissions, and the ease of publishing of a simple
file upload.
2023-11-07 20:51:10 -06:00
Dustin 6955c4e7ad hosts: Decommission dc-4k6s8e.p.b
Replaced by *dc-nrtxms.pyrocufflink.blue*
2023-10-28 16:07:56 -05:00
Dustin 420764d795 hosts: Add dc-nrtxms.p.b
New Fedora 38 Active Directory Domain Controller
2023-10-28 16:07:39 -05:00
Dustin a8c184d68c hosts: Decommission dc-ag62kz.p.b
Replaced by *dc-qi85ia.pyrocufflink.blue*
2023-10-28 16:07:08 -05:00