Commit Graph

497 Commits

Author SHA1 Message Date
bot
1c4b5e19a4 firefly-iii: Update to 6.1.25 2024-12-21 12:32:08 +00:00
bot
2691b58c05 zwavejs2mqtt: Update to 9.29.0 2024-12-21 12:32:04 +00:00
bot
50459e111e zigbee2mqtt: Update to 1.42.0 2024-12-21 12:32:04 +00:00
bot
387b7d120e whisper: Update to 2.4.0 2024-12-21 12:32:04 +00:00
bot
1768778b44 home-assistant: Update to 2024.12.5 2024-12-21 12:32:03 +00:00
2b6830f131 cert-manager: Configure ACME DNS.01 for dch-ca
Since transitioning to externalIPs for TCP services, it is no longer
possible to use the HTTP.01 ACME challenge to issue certificates for
services hosted in the cluster, because the ingress controller does not
listen on those addresses.  Thus, we have to switch to using the DNS.01
challenge.  I had avoided using it before because of the complexity of
managing dynamic DNS records with the Samba AD server, but this was
actually pretty to work around.  I created a new DNS zone on the
firewall specifically for ACME challenges.  Names in the AD-managed zone
have CNAME records for their corresponding *_acme-challenge* labels
pointing to this new zone.  The new zone has dynamic updates enabled,
which _cert-manager_ supports using the RFC2136 plugin.

For now, this is only enabled for _rabbitmq.pyrocufflink.blue_.  I will
transition the other names soon.
2024-12-09 17:58:43 +00:00
4243823ba5 invoice-ninja: Fix network policy for ingress
Since the IP address assigned to the ingress controller is now managed
by keepalived and known to Kubernetes, the network policy needs to allow
access to it by pod namespace rather than IP address.  It seems that the
former takes precedence over the latter, so even though the IP address
was explicitly allowed, traffic was not permitted because it was
destined for a Kubernetes service that was not.
2024-12-07 09:28:44 -06:00
b269fa5812 home-assistant: Add service to shut down desk panel
Home Assistant can now SSH into the desk panel and shut it down.
2024-12-02 23:06:30 +00:00
107852ad54 home-assistant: Eable auto-login for desk panel
Home Assistant supports unauthenticated access for certain clients using
its _trusted_network_ auth provider.  With this configuration, we allow
the desk panel to automatically sign in as the _kiosk_ user, but all
other clients must authenticate normally.
2024-11-27 22:03:40 -06:00
72d3f222c5 jenkins: Trust SSHCA for pyrocufflink.black
The new machines have names in the _pyrocufflink.black_ zone.  We need
to trust the SSHCA certificate to sign keys for these names in order to
connect to them and manage them with Ansible.
2024-11-26 03:35:21 +00:00
2a90ffc7a9 invoice-ninja: Update trusted proxies addresses
Since _ingress-nginx_ no longer runs in the host network namespace,
traffic will appear to come from pods' internal IP addresses now.
Similarly, the network policy for Invoice Ninja needs to be updated to
allow traffic _to_ the ingress controllers' new addresses.
2024-11-22 22:43:16 -06:00
1f7631d6b7 home-assistant: Update trusted proxies addresses
Since _ingress-nginx_ no longer runs in the host network namespace,
traffic will appear to come from pods' internal IP addresses now.
2024-11-22 22:42:43 -06:00
607fa050f3 firefly-iii: Update trusted proxies addresses
Since _ingress-nginx_ no longer runs in the host network namespace,
traffic will appear to come from pods' internal IP addresses now.
2024-11-22 22:41:49 -06:00
0a5af84778 rabbitmq: Configure Service externalIPs
Clients outside the cluster can now communicate with RabbitMQ directly
on port 5671 by using its dedicated external IP address.  This address
is automatically assigned to the node where RabbitMQ is running by
`keepalived`.
2024-11-22 22:39:30 -06:00
1a39a8869a h-a/mosquitto: Configure Service externalIPs
Clients outside the cluster can now communicate with Mosquitto directly
on port 8883 by using its dedicated external IP address.  This address
is automatically assigned to the node where Mosquitto is running by
`keepalived`.
2024-11-22 22:37:01 -06:00
fefbaa9991 ingress: Use Deployment+Service with externalIPs
Now that we have `keepalived` managing the "virtual" IP address for the
ingress controller, we can change _ingress-nginx_ to run as a Deployment
rather than a DaemonSet.  It no longer needs to use the host network
namespace, as `kube-proxy` will route all traffic sent to the configured
external IP address to the controller pods.  Using the _Local_ external
traffic policy disables NAT, so incoming traffic is seen by the
nginx unmodified.
2024-11-22 22:35:37 -06:00
e7ea2b0659 keepalived: Initial commit
Running `keepalived` as a DaemonSet will allow managing floating
"virtual" IP addresses for Kubernetes services with configured external
IP addresses.  The main services we want to expose outside the cluster
are _ingress-nginx_, Mosquitto, and RabbitMQ.  The `keepalived` cluster
will negotiate using the VRRF protocol to determine which node should
have each external address.  Using the process tracking feature of
`keepalived`, we can steer traffic directly to the node where the target
service is running.
2024-11-22 22:26:48 -06:00
5c78bb89b5 Merge remote-tracking branch 'refs/remotes/origin/master' 2024-11-22 19:38:00 -06:00
0a6086eb2a longhorn: Run on dedicated nodes
I've created new worker nodes that are dedicated to running Longhorn
replicas.  These nodes are tainted with the
`node-role.kubernetes.io/longhorn` taint, so no regular pods will be
scheduled there by default.  Longhorn pods thus needs to be configured
to tolerate that taint, and to be scheduled on nodes with the
similarly-named label.
2024-11-21 22:59:14 -06:00
d6c83565ec rabbitmq: Update to 4.0
RabbitMQ Server 3.13 is out of support now.
2024-11-21 22:59:14 -06:00
121e6e7111 rabbitmq: Switch to using volume claim templates
This will make it easier to "blow away" the RabbitMQ data volume on the
occasions when it gets into a weird state.  Simply scale the StatefulSet
down to 0 replicas, delete the PVC, then scale back up.  Kubernetes will
handle creating a new PVC automatically.
2024-11-21 22:59:14 -06:00
3d5dd52eb9 ingress: Use upstream resources w/ patches
This will make it easier to upgrade, since we keep track of _exactly_
what we changed from the upstream resources with Kustomize patches.
2024-11-21 19:42:35 -06:00
3b3d4c38ed dynk8s: Move Wireguard config to SealedSecret 2024-11-21 19:41:55 -06:00
da81a336e1 dynk8s-provisioner: Migrate to Kustomize 2024-11-19 10:43:42 -06:00
e0c633c21e v-m: scrape: Fix Nextcloud URL
Nextcloud uses a _client-side_ (Javascript) redirect to navigate the
browser to its `index.php`.  The page it serves with this redirect is
static and will often load successfully, even if there is a problem with
the application.  This causes the Blackbox exporter to record the site
as "up," even when it it definitely is not.  To avoid this, we can
scrape the `index.php` page explicitly, ensuring that the application is
loaded.
2024-11-17 18:43:00 +00:00
14492d827a Merge pull request 'home-assistant: Update to 2024.11.2' (#34) from updatebot/home-assistant into master
Reviewed-on: #34
2024-11-16 18:04:43 +00:00
444686cb1e Merge pull request 'paperless-ngx: Update to 2.13.0' (#31) from updatebot/paperless-ngx into master
Reviewed-on: #31
2024-11-16 17:55:04 +00:00
ceea84d7f9 Merge pull request 'firefly-iii: Update to 6.1.22' (#33) from updatebot/firefly-iii into master
Reviewed-on: #33
2024-11-16 17:45:08 +00:00
bot
4d2cc40b5e tika: Update to 3.0.0.0 2024-11-16 12:32:14 +00:00
bot
c31db5fde2 gotenberg: Update to 8.13.0 2024-11-16 12:32:14 +00:00
bot
74ce0e1b0a paperless-ngx: Update to 2.13.5 2024-11-16 12:32:14 +00:00
bot
f0b16fd53c firefly-iii: Update to 6.1.22 2024-11-16 12:32:12 +00:00
bot
acd9a0fa92 zwavejs2mqtt: Update to 9.27.2 2024-11-16 12:32:08 +00:00
bot
115b4ade39 home-assistant: Update to 2024.11.2 2024-11-16 12:32:08 +00:00
c1927eecfc Merge pull request 'home-assistant: Update to 2024.10.4' (#30) from updatebot/home-assistant into master
Reviewed-on: #30
2024-11-12 15:56:50 +00:00
04ef1faf75 Merge pull request 'authelia: Update to 4.38.17' (#32) from updatebot/authelia into master
Reviewed-on: #32
2024-11-12 15:14:50 +00:00
0209f921c3 v-m: Remove nut0 from scrape targets
_nut0.pyrocufflink.blue_ is decommissioned.
2024-11-12 08:02:00 -06:00
62b19e942b sshca: Add machine ID for nut1.p.b 2024-11-10 11:19:53 -06:00
bot
b956e9ac05 authelia: Update to 4.38.17 2024-11-09 12:32:16 +00:00
bot
f7eb3b49e7 zwavejs2mqtt: Update to 9.26.0 2024-11-09 12:32:08 +00:00
bot
0db830a670 zigbee2mqtt: Update to 1.41.0 2024-11-09 12:32:08 +00:00
bot
6d137af6dc home-assistant: Update to 2024.11.1 2024-11-09 12:32:08 +00:00
3d40424cf7 fleetlock: Use patched server from Github PR
The _fleetlock_ server drains all pods from a node before allocating the
reboot lock to that node.  Unfortunately, it doesn't actually wait for
those pods to be completely evicted.  If some pods take too long to shut
down, they may get stuck in `Terminating` state once the machine starts
rebooting.  This makes it so those pods cannot be replaced on another
node with the original one is offline, which pretty much defeats the
purpose of using Fleetlock in the first place.

It seems upstream has abandoned this project, as there is an open [Pull
Request][0] to fix this issue that has so far been ignored.
Fortunately, building a new container image containing the patch is easy
enough, so we can run our own patched build.

[0]: https://github.com/poseidon/fleetlock/pull/271
2024-11-05 07:05:55 -06:00
ac62a77c96 Merge branch '20125' 2024-11-05 07:05:19 -06:00
e1d9833e83 cert-manager: Add cert for apps.du5t1n.xyz 2024-11-05 07:04:27 -06:00
4ad5518f18 cert-manager: Migrate config to configMapGenerator 2024-11-05 07:04:09 -06:00
9f287d0f71 v-m/alerts: Add alerts for backup RAID array
Just like I did with the RAID-1 array in the old BURP server, I will
keep one member active and one in the fireproof safe, swapping them each
month.  We can use the same metrics queries to alert on when the swap
should happen that we used with the BURP server.
2024-11-04 20:46:03 -06:00
2380468658 v-m/scrape: Collect Jellyfin metrics 2024-11-04 20:38:25 -06:00
db7c07ee55 v-m/scrape: Ignore cloud Kubernetes nodes
The ephemeral Jenkins worker nodes that run in AWS don't have colletcd,
promtail, or Zincati.  We don't needto get three alerts every time a
worker starts up to handle am ARM build job, so we drop these discovered
targets for these scrape jobs.
2024-11-04 20:35:17 -06:00
d76a1360c8 v-m/alerts: Ignore Paperless consume_file task
Paperless-ngx uses a Celery task to process uploaded files, converting
them to PDF, running OCR, etc.  This task can be marked as "failed" for
various reasons, most of which are more about the document itself than
the health of the application.  The GUI displays the results of failed
tasks when they occur.  It doesn't really make sense to have an alert
about this scenario, especially since there's nothing to do to directly
clear the alert anyway.
2024-11-04 20:28:11 -06:00