Commit Graph

196 Commits (572022b557913b7ff0dfac7e89c54a1286ec6d4c)

Author SHA1 Message Date
Dustin 78d70af574 hosts: Add Unifi controllers to needproxy group
Since the network device management network does not have access to the
Internet, the Unifi controller machines must access it via the proxy.
2025-03-19 07:50:52 -05:00
Dustin db54b03aa8 r/unifi: Switching to custom container image
The _linuxserver.io_ image for UniFi Network is deprecated.  It sucked
anyway.  I've created a simple image based on Debian that installs the
_unifi_ package from the upstream apt repository.  This image doesn't
require running anything as _root_, so it doesn't need a user namespace.
2025-03-16 16:40:57 -05:00
Dustin c300dc1b6c chrony: Add role/PB for chrony
I continually struggle with machines' (physical and virtual, even the
Roku devices!) clocks getting out of sync.  I have been putting off
fixing this because I wanted to set up a Windows-compatible NTP server
(i.e. on the domain controllers, with Kerberos signing), but there's
really no reason to wait for that to fix the clocks on all the
non-Windows machines, especially since there are exactly 0 Windows
machines on the network right now.

The *chrony* role and corresponding `chrony.yml` playbook are generic,
configured via the `chrony_pools`, `chrony_servers`, and `chrony_allow`
variables.  The values for these variables will configure the firewall
to act as an NTP server, synchronizing with the NTP pool on the
Internet, while all other machines will synchronize with it.  This
allows machines on networks without Internet access to keep their clocks
in sync.
2025-03-16 16:37:19 -05:00
Dustin 5f4b1627db hosts: Add nut1.p.b to pyrocufflink group
*nut1.pyrocufflink.blue* is a member of the *pyrocufflink.blue* AD
domain.  I'm not sure how it got to be so without belonging to the
_pyrocufflink_ Ansible group...
2025-02-25 21:03:14 -06:00
Dustin f705e98fab hosts: Add k8s-iot-net-ctrl group
The *k8s-iot-net-ctrl* group is for the Raspberry Pi that has the Zigbee
and Z-Wave controllers connected to it.  This node runs the Zigbee2MQTT
and ZWaveJS2MQTT servers as Kubernetes pods.
2025-01-31 19:49:51 -06:00
Dustin b1c29fc12a hosts: Remove hostvds group
Since the _hostvds_ group is not defined in the static inventory but by
the OpenStack inventory plugin via `hostvds.openstack.yml`, when the
static inventory is used by itself, Ansible fails to load it with an
error:

> Section [vps:children] includes undefined group: hostvds

To fix this, we could explicitly define an empty _hostvds_ group in the
static inventory, but since we aren't currently running any HostVDS
instances, we might as well just get rid of it.
2025-01-31 19:45:58 -06:00
Dustin ec4fa25bd8 Merge remote-tracking branch 'refs/remotes/origin/master' 2025-01-30 21:15:40 -06:00
Dustin c00d6f49de hosts: Add OVH VPS
It turns out, $0.99/mo might be _too_ cheap for a cloud server.  Running
the Blackbox Exporter+vmagent on the HostVDS instance worked for a few
days, but then it started having frequent timeouts when probing the
websites.  I tried redeploying the instance, switching to a larger
instance, and moving it to different networks.  Unfortunately, none of
this seemed to help.

Switching over to a VPS running in OVH cloud.  OVH VPS servers are
managed statically, as opposed to via API, so we can't use Pulumi to
create them.  This one was created for me when I signed up for an OVH
acount.
2025-01-26 13:08:59 -06:00
Dustin 33f315334e users: Configure sudo on some machines
`doas` is not available on Alma Linux, so we still have to use `sudo` on
the VPS.
2025-01-26 13:08:59 -06:00
Dustin ad0bd7d4a5 remote-blackbox: Add group
The _remote-blackbox_ group defines a system that runs
_blackbox-exporter_ and _vmagent_ in a remote (cloud) location.  This
system will monitor our public web sites.  This will give a better idea
of their availability from the perspective of a user on the Internet,
which can be by factors that are necessarily visible from within the
network.
2025-01-26 13:08:59 -06:00
Dustin f5bee79bac hosts: Decommission bw0.p.b
Vaultwarden is now hosted in Kubernetes.
2025-01-10 20:09:53 -06:00
Dustin d993d59bee Deploy new Kubernetes nodes
The *stor-* nodes are dedicated to Longhorn replicas.  The other nodes
handle general workloads.
2024-11-24 10:33:21 -06:00
Dustin 0f600b9e6e kubernetes: Manage worker nodes
So far, I have been managing Kubernetes worker nodes with Fedora CoreOS
Ignition, but I have decided to move everything back to Fedora and
Ansible.  I like the idea of an immutable operating system, but the FCOS
implementation is not really what I want.  I like the automated updates,
but that can be accomplished with _dnf-automatic_.  I do _not_ like
giving up control of when to upgrade to the next Fedora release.
Mostly, I never did come up with a good way to manage application-level
configuration on FCOS machines.  None of my experiments (Cue+tmpl,
KCL+etcd+Luci) were successful, which mostly resulted in my manually
managing configuration on nodes individually.  Managing OS-level
configuration is also rather cumbersome, since it requires redeploying
the machine entirely.  Altogether, I just don't think FCOS fits with my
model of managing systems.

This commit introduces a new playbook, `kubernetes.yml`, and a handful of
new roles to manage Kubernetes worker nodes running Fedora Linux.  It
also adds two new deploy scripts, `k8s-worker.sh` and `k8s-longhorn.sh`,
which fully automate the process of bringing up worker nodes.
2024-11-24 10:33:21 -06:00
Dustin a82700a257 chromie: Configure serial terminal server 2024-11-10 13:15:08 -06:00
Dustin 010f652060 hosts: Add loki1.p.b
_loki1.pyrocufflink.blue_ replaces _loki0.pyrocufflink.blue_.  The
former runs Fedora Linux and is managed by Ansible, while the latter ran
Fedora CoreOS and was managed by Ignition and _cfg_.
2024-11-05 06:54:27 -06:00
Dustin 4cd983d5f4 loki: Add role+playbook for Grafana Loki
The current Grafana Loki server, *loki0.pyrocufflink.blue*, runs Fedora
CoreOS and is managed by Ignition and *cfg*.  Since I have declared
*cfg* a failed experiment, I'm going to re-deploy Loki on a new VM
running Fedora Linux and managed by Ansible.

The *loki* role installs Podman and defines a systemd-managed container
to run Grafana Loki.
2024-10-20 12:10:55 -05:00
Dustin ceaef3f816 hosts: Decommission burp1.p.b
Everything has finally been moved to Chromie.
2024-10-13 17:52:48 -05:00
Dustin 5ced24f2be hosts: Decommission matrix0.p.b
The Synapse server hasn't been working for a while, but we don't use it
for anything any more anyway.
2024-10-13 12:53:49 -05:00
Dustin 621f82c88d hosts: Migrate remaining hosts to Restic
Gitea and Vaultwarden both have SQLite databases.  We'll need to add
some logic to ensure these are in a consistent state before beginning
the backup.  Fortunately, neither of them are very busy databases, so
the likelihood of an issue is pretty low.  It's definitely more
important to get backups going again sooner, and we can deal with that
later.
2024-09-07 20:45:24 -05:00
Dustin c2c283c431 nextcloud: Back up Nextcloud with Restic
Now that the database is hosted externally, we don't have to worry about
backing it up specifically.  Restic only backs up the data on the
filesystem.
2024-09-04 17:41:42 -05:00
Dustin 0f4dea9007 restic: Add role+playbook for Restic backups
The `restic.yml` playbook applies the _restic_ role to hosts in the
_restic_ group.  The _restic_ role installs `restic` and creates a
systemd timer and service unit to run `restic backup` every day.

Restic doesn't really have a configuration file; all its settings are
controlled either by environment variables or command-line options. Some
options, such as the list of files to include in or exclude from
backups, take paths to files containing the values.  We can make use of
these to provide some configurability via Ansible variables.  The
`restic_env` variable is a map of environment variables and values to
set for `restic`.  The `restic_include` and `restic_exclude` variables
are lists of paths/patterns to include and exclude, respectively.
Finally, the `restic_password` variable contains the password to decrypt
the repository contents.  The password is written to a file and exposed
to the _restic-backup.service_ unit using [systemd credentials][0].

When using S3 or a compatible service for respository storage, Restic of
course needs authentication credentials.  These can be set using the
`restic_aws_credentials` variable.  If this variable is defined, it
should be a map containing the`aws_access_key_id` and
`aws_secret_access_key` keys, which will be written to an AWS shared
credentials file.  This file is then exposed to the
_restic-backup.service_ unit using [systemd credentials][0].

[0]: https://systemd.io/CREDENTIALS/
2024-09-04 09:40:29 -05:00
Dustin 708bcbc87e Merge remote-tracking branch 'refs/remotes/origin/master' 2024-09-03 17:18:18 -05:00
Dustin a0378feda8 nextcloud: Move database to db0
Moving the Nextcloud database to the central PostgreSQL server will
allow it to take advantage of the monitoring and backups in place there.
For backups specifically, this will make it easier to switch from BURP
to Restic, since now only the contents of the filesystem need backed up.

The PostgreSQL server on _db0_ requires certificate authentication for
all clients.  The certificate for Nextcloud is stored in a Secret in
Kubernetes, so we need to use the _nextcloud-db-cert_ role to install
the script to fetch it.  Nextcloud configuration doesn't expose the
parameters for selecting the certificate and private key files, but
fortunately, they can be encoded in the value provided to the `host`
parameter, though it makes for a rather cumbersome value.
2024-09-02 21:03:33 -05:00
Dustin d3a09a2e88 hosts: Add chromie, nvr2 to nut-monitor group
Deploy `nut-monitor` on these physical machines so they will shut down
safely in the event of a power outage.
2024-09-01 18:52:33 -05:00
Dustin db74e9ac3f btop: Install btop and run it on the console
`btop` is so much better than `top`.  It makes a really nice status
indicator for machine health, so I like running it on tty1.
2024-09-01 09:24:53 -05:00
Dustin fbf587414a hosts: Add chromie.p.b
*chromie.pyrocufflink.blue* will replace *burp1.pyrocufflink.blue* as
the backup server.  It is running on the hardware that was originally
*nvr1.pyrocufflink.blue*: a 1U Jetway server with an Intel Celeron N3160
CPU and 4 GB of RAM.
2024-09-01 09:01:04 -05:00
Dustin 9d60ae1a61 minio-backups: Deploy MinIO for backups
This playbook uses the *minio-nginx* and *minio-backups-cert* role to
deploy MinIO with nginx.

The S3 API server is *s3.backups.pyrocufflink.blue*, and buckets can be
accessed as subdomains of this name.

The Admin Console is *minio.backups.pyrocufflink.blue*.

Certificates are issued by DCH CA via ACME using `certbot`.
2024-09-01 08:59:28 -05:00
Dustin 2a110d7aba hosts: Deploy haproxy0
_haproxy0.pyrocufflink.blue_ is a Fedora Linux VM that runs HAProxy to
provide reverse proxy, exposing web sites and applications to the
Internet.  It has a static MAC address because it will need a static IP
address, at least initially, in order for DNAT to work.
2024-08-24 11:46:40 -05:00
Dustin aab581e859 hosts: Move VM hosts from hosts.offline
Originally, the VM hosts were in a separate inventory so they would
not be managed with the rest of the servers.  It used to be that one
server was running all the VMs, while the other was asleep.  That's
no longer the case; both alre always running and each has about half
of the VMs.  Since they're both always online, they can be managed
normally now.
2024-08-23 09:33:29 -05:00
Dustin 6e5e12f8b6 hosts: Add nvr2.p.b to collectd-sensors group
To enable collecting temperature et al. sensor data.
2024-08-14 20:26:11 -05:00
Dustin d2b3b1f7b3 hosts: Deploy production Frigate on nvr2.p.b
*nvr2.pyrocufflink.blue* originally ran Fedora CoreOS.  Since I'm tired
of the tedium and difficulty involved in making configuration changes to
FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.
2024-08-12 22:22:50 -05:00
Dustin 7b61a7da7e r/useproxy: Configure system-wide proxy
The *useproxy* role configures the `http_proxy` et al. environmet
variables for systemd services and interactive shells.  Additionally, it
configures Yum repositories to use a single mirror via the `baseurl`
setting, rather than a list of mirrors via `metalink`, since the proxy
a) the proxy only allows access to _dl.fedoraproject.org_ and b) the
proxy caches RPM files, but this is only effective if all clients use
the same mirror all the time.

The `useproxy.yml` playbook applies this role to servers in the
*needproxy* group.
2024-08-12 18:47:04 -05:00
Dustin 2ce211b5ea hosts: Add db0.p.b
*db0.pyrocufflink.blue* will be the primary server in the new PostgreSQL
database cluster.  We're starting with Fedora 39 so we can have
PostgreSQL 15, to match the version managed by the Postgres Operator in
the Kubernetes cluster right now.
2024-07-02 20:44:29 -05:00
Dustin 208fadd2ba postgresql: Configure for dedicated DB servers
I am going to use the *postgresql* group for the dedicated database
servers.  The configuration for those machines will be quite a bit
different than for the one existing machine that is a member of that
group already: the Nextcloud server.  Rather than undefine/override all
the group-level settings at the host level, I have removed the Nextcloud
server from the *postgresql* group, and updated the `nextcloud.yml`
playbook to apply the *postgresql-server* role itself.

Eventually, I want to move the Nextcloud database to the central
database servers.  At that point, I will remove the *postgresql-server*
role from the `nextcloud.yml` playbook.
2024-07-02 20:44:29 -05:00
Dustin 332ef18600 hosts: Decommission old Kubernetes workers
*k8s-amd64-n0.pyrocufflink.blue*, *k8s-amd64-n1.pyrocufflink.blue*, and
*k8s-amd64-n2.pyrocufflink.blue*, which ran Fedora Linux, have been
replaced by *k8s-amd64-n4.pyrocufflink.blue*,
*k8s-amd64-n5.pyrocufflink.blue*, and *k8s-amd64-n6.pyrocufflink.blue*,
respectively.  The new machines run Fedora CoreOS, and are thus not
managed by the Ansible configuration policy.
2024-06-23 10:43:15 -05:00
Dustin afcd2f2f05 hosts: Replace domain controllers
New AD DC servers run Fedora 40.  Their LDAP server certificates are
issued by *step-ca* via ACME, signed by *dch-ca r2*.

I've changed the naming convention for domain controllers again.  I
found the random sequenc of characters to be too difficult to remember
and identify.  Using a short random word (chosen from the EFF word list
used by Diceware) should be a lot nicer.  These names are chosen by the
`create-dc.sh` script.
2024-06-12 19:01:37 -05:00
Dustin 5a9b8b178a hosts: Decommission unifi1
*unifi1.pyrocufflink.blue* is being replaced with
*unifi2.pyrocufflink.blue*.  The new server runs Fedora CoreOS.
2024-05-26 10:50:32 -05:00
Dustin 226a9e05fa nut: Drop group
NUT is managed by _cfg.git_ now.
2024-02-22 10:24:16 -06:00
Dustin 493663e77f frigate: Drop group
Frigate is no longer managed by Ansible.  Dropping the group so the file
encrypted with Ansible Vault can go away.
2024-02-22 10:23:19 -06:00
Dustin fdc59fe73b pyrocufflink-dns: Drop group
The internal DNS server for the *pyrocufflink.blue* et al. domains runs
on the firewall now, and is thus no longer managed by Ansible.  Dropping
the group variables so the file encrypted with Ansible Vault can go
away.
2024-02-22 10:23:19 -06:00
Dustin f9f8d5aa29 Remove grafana, metricspi groups
With the Metrics Pi decommissioned and Victoria Metrics and Grafana
running in Kubernetes now, these groups are no longer needed.
2024-02-22 10:23:19 -06:00
Dustin 13e6433fff hosts: Remove logs0.p.b
Decommissioning Graylog
2024-02-13 16:12:20 -06:00
Dustin 2e77502a2f hosts: Decommission serial0.p.b
*serial0.pyrocufflink.blue* has been replaced by
*serial1.pyrocufflink.blue*.  The latter runs Fedora CoreOS and is
managed by the CUE-based configuration policy in *cfg.git*.
2024-01-25 20:22:00 -06:00
Dustin 423951bac1 {burp1, gw1}: Configure upsmon 2024-01-19 21:55:36 -06:00
Dustin f31018f514 hosts: Remove serial0 from nut group
*nut0.pyrocufflink.blue* is the new NUT server.  It's not managed by
this configuration policy.
2024-01-16 17:41:50 -06:00
Dustin 1226f1f005 hosts: Decommission mtrcs0.p.b
The Metrics Pi has bit the dust.  The NVMe disk has never been
particularly reliable, but now it's gotten to the point where it's a
real issue.  The Pi needs rebooted at least once a day.

I've moved the Victoria Metrics/Grafana ecosystem to Kubernetes.
2023-12-31 19:15:55 -06:00
Dustin c6f0ea9720 r/repohost: Configure Yum package repo host
So it turns out Gitea's RPM package repository feature is less than
stellar.  Since each organization/user can only have a single
repository, separating packages by OS would be extremely cumbersome.
Presumably, the feature was designed for projects that only build a
single PRM for each version, but most of my packages need multiple
builds, as they tend to link to system libraries.  Further, only the
repository owner can publish to user-scoped repositories, so e.g.
Jenkins cannot publish anything to a repository under my *dustin*
account.  This means I would ultimately have to create an Organization
for every OS/version I need to support, and make Jenkins a member of it.
That sounds tedious and annoying, so I decided against using that
feature for internal packages.

Instead, I decided to return to the old ways, publishing packages with
`rsync` and serving them with Apache.  It's fairly straightforward to
set this up: just need a directory with the appropriate permissions for
users to upload packages, and configure Apache to serve from it.

One advantage Gitea's feature had over a plain directory is its
automatic management of repository metadata.  Publishers only have to
upload the RPMs they want to serve, and Gitea handles generating the
index, database, etc. files necessary to make the packages available to
Yum/dnf.  With a plain file host, the publisher would need to use
`createrepo` to generate the repository metadata and upload that as
well.  For repositories with multiple packages, the publisher would need
a copy of every RPM file locally in order for them to be included in the
repository metadata.  This, too, seems like it would be too much trouble
to be tenable, so I created a simple automatic metadata manager for the
file-based repo host.  Using `inotifywatch`, the `repohost-createrepo`
script watches for file modifications in the repository base directory.
Whenever a file is added or changed, the directory containing it is
added to a queue.  Every thirty seconds, the queue is processed; for
each unique directory in the queue, repository metadata are generated.

This implementation combines the flexibility of a plain file host,
supporting an effectively unlimited number of repositories with
fully-configurable permissions, and the ease of publishing of a simple
file upload.
2023-11-07 20:51:10 -06:00
Dustin 6955c4e7ad hosts: Decommission dc-4k6s8e.p.b
Replaced by *dc-nrtxms.pyrocufflink.blue*
2023-10-28 16:07:56 -05:00
Dustin 420764d795 hosts: Add dc-nrtxms.p.b
New Fedora 38 Active Directory Domain Controller
2023-10-28 16:07:39 -05:00
Dustin a8c184d68c hosts: Decommission dc-ag62kz.p.b
Replaced by *dc-qi85ia.pyrocufflink.blue*
2023-10-28 16:07:08 -05:00