Commit Graph

466 Commits (3d9d7423efc1ae7fc37e6b129f98fa88e490c28f)

Author SHA1 Message Date
Dustin 3d9d7423ef hosts: Add stats0.p.b to prometheus group
Although configuration policy is not yet available for Prometheus
itself, the `collectd.yml` playbook also uses the *prometheus* host
group.  Specifically, hosts in this group are configured to receive
collectd data from other hosts and expose those data through the
`write_prometheus` plugin.
2021-07-05 09:34:25 -05:00
Dustin 47954dca48 ci: Add pipeline for Grafana 2021-07-02 21:55:02 -05:00
Dustin 9565b740b0 hosts: add stats0.p.b
*stats0.pyrocufflink.blue* hosts Grafana (for now, adding Victoria
Metrics, etc. later)
2021-07-02 21:55:02 -05:00
Dustin 5e61e93cea roles/grafana: Deploy Grafana
This commit introduces the *grafana* role and the corresponding
`grafana.yml` playbook.  The role installs Grafana using the system
package manager, and configures the server (including LDAP
authentication).
2021-07-02 21:47:33 -05:00
Dustin 5de8f3e612 roles/nginx: Add role for nginx
This role installs nginx and provides a base/skeleton configuration
suitable for most deployments.
2021-06-29 21:00:46 -05:00
Dustin 1fcfc80254 r/protonvpn: watchdog: Also watch for EAP/FAIL
Occasionally, ProtonVPN servers randomly reject the EAP authentication
credentials.  When this happens, the tunnel fails and is not restarted
automatically by strongSwan.  As such, the watchdog needs to react to
this event as well.
2021-06-27 09:23:46 -05:00
Dustin 6b9b87a406 roles/nextcloud: Configure outbound email
Since the Nextcloud configuration file is managed by the configuration
policy, all of the settings configurable through the web UI need to be
templated.  One important group of settings is the outbound email
configuration.  This can now be configured using the `nextcloud_smtp`
Ansible variable.
2021-06-25 11:12:38 -05:00
Dustin c68f10d771 roles/nextcloud: Use Redis for caching
The Nextcloud community [recommends][0] using Redis as a cache provider,
to improve response times and file locking reliability.
2021-06-25 11:12:12 -05:00
Dustin 05a414100a roles/redis: Add role to deploy Redis
This simple role installs the *redis* package and starts the associated
service.  It leaves the configuration as provided by upstream, at least
for now.
2021-06-25 11:10:10 -05:00
Dustin b86e0d8f29 roles/nextcloud: Switch to Fedora package
Fedora now includes a packaged version of Nextcloud.  This will be
_much_ easier to maintain than the tarball-based distribution method.
There are some minor differences in how the Fedora package works,
compared to the upstream tarball.  Notably, it puts the configuration
file in `/etc/` and makes it read-only, and it stores persistent data
separate from the application.  These differences require modifications
to the Apache and PHP-FPM configuration, but the package also included
examples to make this easier.  Since the `config.php` is read-only now,
it has to be managed by the configuration policy; it cannot be modified
by the Administration web UI.
2021-06-24 20:21:48 -05:00
Dustin 0add34a9a3 roles/protonvpn: Add watchdog script
One major problem with the current DNS-over-VPN implementation is that
the ProtonVPN servers are prone to random outages.  When the server
we're using goes down, there is not a straightforward way to switch to
another one.  At first I tried creating a fake DNS zone with A records
for each ProtonVPN server, all for the same name.  This ultimately did
not work, but I am not sure I understand why.  strongSwan would
correctly resolve the name each time it tried to connect, and send IKE
initialization requests to a different address each time, but would
reject the responses from all except the first address it used.  The
only way to get it working again was to restart the daemon.

Since strongSwan is apparently not going to be able to handle this kind
of fallback on its own, I decided to write a script to do it externally.
Enter `protonvpn-watchdog.py`.  This script reads the syslog messages
from strongSwan (via the systemd journal, using `journalctl`'s JSON
output) and reacts when it receives the "giving up after X tries"
message.  This message indicates that strongSwan has lost connection to
the current server and has not been able to reestablish it within the
retry period.  When this happens, the script will consult the cached
list of ProtonVPN servers and find the next one available.  It keeps
track of the ones that have failed in the past, and will not connect to
them again, so as not to simply bounce back-and-forth between two
(possibly dead) servers.  Approximately every hour, it will attempt to
refresh the server list, to ensure that the most accurate server scores
and availability are known.
2021-06-21 20:48:23 -05:00
Dustin bb6186b90e roles/mosquitto: Add role to deploy MQTT server
*Mosquitto* implements an MQTT server.  It is the recommended
implementation for using MQTT with Home Assistant.

I have added this role to deploy Mosquitto on the Home Assistant server.
It will be used to send data from custom sensors, such as the
temperature/pressure/humidity sensor connected to the living room wall
display.
2021-05-02 19:10:17 -05:00
Dustin 7b04326146 roles/websites/chmod777: Remove HTTP vhost
Since there are no other plain HTTP virtual hosts, the one defined for
chmod777.sh became the "default."  Since it explicitly redirects all
requests to https://chmod777.sh, it caused all non-HTTPS requests to be
redirected there, regardless of the requested name.  This was
particularly confusing for Tabitha, as she frequently forgets to put
https://…, and would find herself at my stupid blog instead of
Nextcloud.
2021-03-11 19:57:37 -06:00
Dustin 90421e77d2 pyrocufflink-dhcp: DHCP reservations for VM hosts
When there is a network issue that prevents DNS names from being
resolved, it can be difficult to troubleshoot.  For example, last night,
the Samba domain controller crashed, so *pyrocufflink.blue* names were
unavailable.  Furthermore, the domain controller VM was apparently
locked up, so I could not SSH into it directly, and it needed to be
rebooted.  Since the VM host's name did not resolve, I could not find
its address to log into it and reboot the VM.  I resorted to scanning
the SSH keys of every IP address on the network until I found the one
that matched the cached key in ~/.ssh/known_hosts.  This was cumbersome
and annoying.

Assigning DHCP reservations to the VM hosts will ensure that when a
situation like this arises again, I can quickly connect to the correct
VM host and manage its virtual machines, as its address is recorded in
the configuration policy.
2021-02-17 20:33:41 -06:00
Dustin 6aaf1b7dbb roles/strongswan-swanctl: Load esp4 module at boot
The *esp4* kernel module does not load automatically on Fedora.  Without
this module, strongSwan can establish IKE SAs, but not ESP SAs.  Listing
the module name in a file in `/etc/modules-load.d` configures the
*systemd-modules-load* service to load it at boot.
2021-02-17 20:33:41 -06:00
Dustin ac516ce09d protonvpn: Switch to US-TX#5
US-IL#41 seems down...
2021-02-13 08:33:59 -06:00
Dustin 243510e74a roles/protonvpn: Fix swanctl syntax
I believe the reason the VPN was not auto-restarting was because I had
incorrectly specified the `keyingtries` and `dpd_delay` configuration options.
These are properties of the top-level connection, not the child. I must
have placed them in the `children` block by accident.
2021-02-09 07:17:32 -06:00
Dustin 81417068c2 roles/synapse: Add cert role dependency
The *cert* role must be defined as a role dependency now, so that the
role can define a handler to "listen" for the "certificate changed"
event.  This change happened on *master*, before the *matrix* branch was
merged.
2021-01-31 15:38:18 -06:00
Dustin 284fb569a2 ci: Add pipeline for Graylog 2021-01-31 15:34:36 -06:00
Dustin 7f8a9ce114 roles/graylog: Update Graylog repository RPM URL
Graylog 3.3 is currently installed on logs0.  Attempting to install the
*graylog-3.1-repository* package causes a transaction conflict, making
the task and playbook fail.
2021-01-31 15:33:42 -06:00
Dustin ee93586a95 roles/apache: Add previously-ignored cert symlinks
Before the advent of `ansible-vault`, and  long before `certbot`/`lego`,
I used to keep certificate files (and especially private key files) out
of the Git repository.  Now that certificates are stored in a separate
repository, and only symlinks are stored in the configuration policy,
this no longer makes any sense.  In particular, it prevents the continuous
enforcement process from installing Let's Encrypt certificates that have
been automatically renewed.
2021-01-24 17:08:00 -06:00
Dustin 7d740079c8 roles/ssh-hostkeys: Remove duplicate keys for git0 2020-12-30 22:13:47 -06:00
Dustin cd577a555e ci: Add pipeline for Synapse (Matrix) 2020-12-30 22:12:54 -06:00
Dustin 5a114eecf0 websites/proxy-matrix: Add Synapse rev proxy setup
The *websites/proxy-matrix* role configures the Internet-facing reverse
proxy to handle the *hatch.chat* domain.  Most Matrix communication
happens over the default HTTPS port, and as such will be directed
through the reverse proxy.
2020-12-30 22:05:26 -06:00
Dustin 2df1605421 hosts: Add matrix0.p.b
*matrix0.pyrocufflink.blue* hosts the Matrix homeserver for
*hatch.chat*.
2020-12-30 22:04:51 -06:00
Dustin 371305bed4 roles/synapse: Deploy the Matrix homeserver
The *synapse* role and the corresponding `synapse.yml` playbook deploy
Synapse, the reference Matrix homeserver implementation.

Deploying Synapse itself is fairly straightforward: it is packaged by
Fedora and therefore can simply be installed via `dnf` and started by
`systemd`.  Making the service available on the Internet, however, is
more involved.  The Matrix protocol mostly works over HTTPS on the
standard port (443), so a typical reverse proxy deployment is mostly
sufficient.  Some parts of the Matrix protocol, however, involve
communication over an alternate port (8448).  This could be handled by a
reverse proxy as well, but since it is a fairly unique port, it could
also be handled by NAT/port forwarding.  In order to support both
deployment scenarios (as well as the hypothetical scenario wherein the
Synapse machine is directly accessible from the Internet), the *synapse*
role supports specifying an optional `matrix_tls_cert` variable.  If
this variable is set, it should contain the path to a certificate file
on the Ansible control machine that will be used for the "direct"
connections (i.e. on port 8448).  If it is not set, the default Apache
certificate will be used for both virtual hosts.

Synapse has a pretty extensive configuration schema, but most of the
options are set to their default values by the *synapse* role.  Other
than substituting secret keys, the only exposed configuration option is
the LDAP authentication provider.
2020-12-30 21:54:02 -06:00
Dustin d0bf4f9893 ci: Add pipeline for motionEye 2020-12-30 21:06:22 -06:00
Dustin 2da26d9bf6 roles/bitwarden_rs: Ensure docker service runs
Since the *bitwarden_rs* relies on Docker for distribution and process
management (at least for now), it needs to ensure that the `docker`
service starts automatically.
2020-12-30 21:02:32 -06:00
Dustin f9e8c78e5a roles/websites: Set authorized_keys file perms
Because the various "webapp.*" users' home directories are under
`/srv/www`, the default SELinux context type is `httpd_sys_content_t`.
The SSH daemon is not allowed to read files with this label, so it
cannot load the contents of these users' `authorized_keys` files.  To
address this, we have to explicitly set the SELinux type to
`ssh_home_t`.
2020-12-30 20:59:27 -06:00
Dustin c92af29e84 roles/named: Send application logs to syslog
BIND sends its normal application logs (as opposed to query logs) to the
`default_debug` channel.  By sending these log messages to syslog, they
can be routed and rotated using the normal system policies.  Using a
separate dedicated log file just ends up consuming a lot of space, as it
is not managed by any policy.
2020-12-26 11:36:15 -06:00
Dustin 59e8244a08 roles/apache: Add tags to tasks
Adding tags to tasks makes them easier to run in isolation.
2020-12-26 11:35:45 -06:00
Dustin aa1ab75edd roles/apache: logrotate: Compress logs immediately
To conserve space on the log volume of web servers, "old" log files are
now compressed immediately, as opposed to waiting until the next
rotation.
2020-12-26 11:33:09 -06:00
Dustin b519f97f6a roles/apache: Disable ssl_request_log
I am not sure the point of having both `ssl_request_log` and
`ssl_access_log`.  The former includes the TLS ciphers used in the
connection, which is not particularly interesting information.  To save
space on the log volume of web servers using Apache, we should just stop
creating this log file.
2020-12-26 11:31:24 -06:00
Dustin d1cdc8bfc3 roles/cert: Add handler topic notification
Changing/renewing a certificate generally requires restarting or
reloading some service.  Since the *cert* role is intended to be generic
and reusable, it naturally does not know what action to take to effect
the change.  It works well for the initial deployment of a new
application, since the service is reloaded anyway in order for the new
configuration to be applied.  It fails, however, for continuous
enforcement, when a certificate is renewed automatically (i.e. by
`lego`) but no other changes are being made.  This has caused a number
of disruptions when some certificate expires and its replacement is
available but has not yet been loaded.

To address this issue, I have added a handler "topic" notification to
the *certs* role.  When either the certificate or private key file is
replaced, the relevant task will "notify" a generic handler "topic."
This allows some other role to define a specific handler, which
"listens" for these notifications, and takes the appropriate action for
its respective service.

For this mechanism to work, though, the *cert* role can only be used as
a dependency of another role.  That role must define the handler and
configure it to listen to the generic "certificate changed" topic.  As
such, each of the roles that are associated with a certificate deployed
by the *cert* role now declare it as a dependency, and the top-level
playbooks only include those roles.
2020-12-26 10:38:17 -06:00
Dustin 8a18c92730 roles/collectd-prometheus: Configure plugin
The *collectd-prometheus* role configures the *write_prometheus* plugin
for collectd.  This plugin exposes data collected or received by the
collectd process in the Prometheus Exposition Format over HTTP.  It
provides the same functionality as the "official" collectd Exporter
maintained by the Prometheus team, but integrates natively into the
collectd process, and is much more complete.

The main intent of this role is to provide a mechanism to combine the
collectd data from all Pyrocufflink hosts and insert it into Prometheus.
By configuring the collectd instance on the Prometheus server itself to
enable and use the *write_prometheus* plugin and to receive the
multicast data from other hosts, collectd itself provides the desired
functionality.
2020-12-26 09:44:04 -06:00
Dustin b6650e4067 ci: collectd: fix syntax error 2020-12-26 09:39:28 -06:00
Dustin 621035e89b Merge branch 'collectd' into master 2020-12-23 21:26:01 -06:00
Dustin 8e180d00ab collectd: Ensure service is enabled 2020-12-23 21:25:49 -06:00
Dustin debf7d8e5a hosts: Add pyrocufflink to collectd group
We want collectd deployed on all production machines.
2020-12-23 21:04:49 -06:00
Dustin 71f55ddfdf hosts: hass1: Set collectd network interface
Because *hass1.pyrocufflink.blue* has multiple interfaces, collectd does
not know which interface it should use to send multicast metrics
messages.  To force it to use the wired interface, which is connected to
the default internal ("blue") network, the `interface` property needs to
be set.
2020-12-23 20:57:01 -06:00
Dustin 8d442b2aaf roles/collectd: Support setting server interface
For hosts with multiple network interfaces, collectd may not send
multicast messages through the correct interface.  To ensure that it
does, the `Interface` configuration option can be specified with each
`Server` option.  To define this option, entries in the
`collectd_network_servers` list can now have an `interface` property.
2020-12-23 20:54:48 -06:00
Dustin 19f0d713f4 hosts: Move koji0.p.b to offline
I doubt I will be using Koji much if at all any more.  In preparation
for decommissioning it, I am moving the Koji inventory to hosts.offline,
to prevent Jenkins jobs from failing.
2020-12-13 09:54:01 -06:00
Dustin 4a4f984f1f ci: Add Jenkins pipeline for collectd 2020-12-08 21:26:43 -06:00
Dustin cbbef24bbd collectd: Install and configure collectd
The *collectd* role, with its corresponding `collectd.yml` playbook,
installs *collectd* onto the managed node and manages basic
configuration for it.  By default, it will enable several plugins,
including the `network` plugin.  The `collectd_disable_plugins` variable
can be set to a list names of plugins that should NOT be enabled.

The default configuration for the `network` plugin instructs *collectd*
to send metrics to the default IPv6 multicast group.  Any host that has
joined this group and is listening on the specified UDP port (default
25826) can receive the data.  This allows for nearly zero configuration,
as the configuration does not need to be updated if the name or IP
address of the receiver changes.

This configuration is ready to be deployed without any variable changes
to all Pyrocufflink servers.  Once *collectd* is running on the servers,
we can set up a *collectd* instance to receive the data and store them
in a time series database (i.e. Prometheus).
2020-12-08 21:11:27 -06:00
Dustin 750cc8afd4 roles/logrotate: Install and enable logrotate
Since Apache HTTPD does not have any built-in log rotation capability,
we need `logrotate`.  Somewhere along the line, the *logrotate* package
stopped being installed by default.  Additionally, with Fedora 30, it
changed from including a drop-in file for (Ana)cron to providing a
systemd timer unit.

The *logrotate* role will ensure that the *logrotate* package is
installed, and that the *logrotate.timer* service is enabled and
running.  This in turn will ensure that `logrotate` runs daily.  Of
course, since the systemd units were added in Fedora 30, machines to
which this role is applied must be running at least that version.

By listing the *logrotate* role as a dependency of the *httpd* role, we
can ensure that `logrotate` manages the Apache error and access log
files on any server that runs Apache HTTPD.
2020-12-08 20:59:40 -06:00
Dustin 1e42959863 roles/bitwarden_rs: Update to new Docker image
The bitwarden_rs team now maintains its Docker images under the
*bitwardenrs* namespace on Docker Hub.
2020-11-13 06:52:56 -06:00
Dustin 132689a3b8 roles/protonvpn: Set infinite keying retries
By default, strongSwan will only attempt key negotiation once and then
give up.  If the VPN connection is closed because of a network issue, it
is unlikely that a single attempt to reconnect will work, so let's keep
trying until it succeeds.
2020-10-10 11:10:12 -05:00
Dustin 3a36d6b7ff hosts: Add motion0.p.b
*motion0.pyrocufflink.blue* hosts motionEye
2020-10-03 11:30:38 -05:00
Dustin ef4e769ed2 motioneye: Deploy motionEye camera software
The *motioneye* role installs motionEye on a Fedora machine using `pip`.
It configures Apache to proxy for motionEye for outside (HTTPS) access.

The official installation instructions and default configuration for
motionEye assume it will be running as root.  There is, however, no
specific reason for this, as it works just fine as an unprivileged user.
The only minor surprise is that the `conf_path` configuration setting
must be writable, as this is where motionEye places generated
configuration for `motion`.  This path does not, however, have to
include the `motioneye.conf` file itself, which can still be read-only.
2020-10-03 11:29:39 -05:00
Dustin 1c575c4340 protonvpn: Connect to server by IP address
Since DNS only allowed to be sent over the VPN, it is not possible to
resolve the VPN server name unless the VPN is already connected.  This
naturally creates a chicken-and-egg scenario, which we can resolve by
manually providing the IP address of the server we want to connect to.
2020-09-23 18:50:06 -05:00