Commit Graph

11 Commits (d6751af3265267de494fe8bc64bb634bf394475a)

Author SHA1 Message Date
Dustin d6751af326 prod/nut: Require both UPS to be online
Unfortunately, the automatic transfer switch does not seem to work
correctly.  When the standby source is a UPS running on battery, it does
*not* switch sources if the primary fails.  In other words, when the
power is out and both UPS are running on battery, when the first one
dies, it will NOT switch to the second one.  It has no trouble switching
when the second source is mains power, though, which is very strange.

I have tried messing with all the settings including nominal input
voltage, sensitivity, and frequency tolerence, but none seem to have any
effect.

Since it is more important for the machines to shut down safely than it
is to have an extra 10-15 minutes of runtime during an outage, the best
solution for now is to configure the hosts to shut down as soon as the
first UPS battery gets low.  This is largely a waste of the second UPS,
but at least it will help prevent data loss.
2024-01-25 21:12:33 -06:00
Dustin e25aa15eb1 prod/nut: Add user for dustin
The `upsrw` command, which is used to set individual UPS configuration
parameters like low battery level, etc., needs a username and password
to authenticate to `upsd`.
2024-01-21 15:10:56 -06:00
Dustin 0450617ae6 prod/nut: Add upsmon user for burp1 2024-01-19 20:57:47 -06:00
Dustin 668b79aaac prod/nut: Add upsmon passwords for gw1, vmhost{0,1} 2024-01-19 19:56:20 -06:00
Dustin f4938c57e1 prod/nut: Reset password for nvr1
The original password worked, but caused a warning in the `upsd` log:

> Ignoring duplicate password for nvr1
2024-01-19 19:32:08 -06:00
Dustin 36fd137897 nut: Infer role from server name, set commands
Since the "primary" `upsmon` is always (for our purposes) running on the
same host as `upsd`, there's no reason to specify both values.

All systems need a shutdown command; one is not set by default.

The primary system is the only one that should send notifications.
2024-01-19 17:57:20 -06:00
Dustin ad42c2d883 nvr1: Add instructions to configure upsmon
*nvr1.pyrocufflink.blue* will run `upsmon` so it can shut itself down
safely when the power goes out.
2024-01-19 16:57:47 -06:00
Dustin fb74f0e81c nut: Configure upsmon
`upsmon` is the component of NUT that tracks the status of UPSs and
reacts to their changing by sending notifications and/or shutting down
the system.  It is a networked application that can run on any system;
it can run on a different system than `upsd`, and indeed can run on
multiple systems simultaneously.

Each system that runs `upsmon` will need a username and password for
each UPS it will monitor.  Using the CUE [function pattern][0], I've
made it pretty simple to declare the necessary values under
`nut.monitor`.

[0]: https://cuetorials.com/patterns/functions/
2024-01-19 08:52:14 -06:00
Dustin 41e9fa85d2 Restructure CUE packages
A bunch of stuff that wasn't schema definitions ended up in the `schema`
package.  Rather than split values up in a bunch of top-level packages,
I think it would be better to have a package-per-app model.
2024-01-17 17:35:18 -06:00
Dustin 52642d37d9 nut: Configure collectd NUT plugin
infra/cfg/pipeline/head This commit looks good Details
2024-01-17 07:18:37 -06:00
Dustin 11f9957c11 Switch from KCL to CUE
Although KCL is unquestionably a more powerful language, and maps more
closely to my mental model of how host/environment/application
configuration is defined, the fact that it doesn't work on ARM (issue
982]) makes it a non-starter.  It's also quite slow (owing to how it
compiles a program to evaluate the code) and cumbersome to distribute.
Fortunately, `tmpl` doesn't care how the values it uses were computed,
so we freely change configuration languages, so long as whatever we use
generates JSON/YAML.

CUE is probably a lot more popular than KCL, and is quite a bit simpler.
It's more restrictive (values cannot be overridden once defined), but
still expressive enough for what I am trying to do (so far).
2024-01-15 11:40:58 -06:00