The target location for WAL archives and backups saved by WAL-G should
be separated based on the major version of PostgreSQL with which they
are compatible. This will make it easier to restore those backups,
since they can only be restored into a cluster of the same version.
Unfortunately, WAL-G does not natively handle this. In fact, it doesn't
really have any way of knowing the version of the PostgreSQL server it
is backing up, at least when it is uploading WAL archives. Thus, we
have to include the version number in the target path (S3 prefix)
manually. We can't rely on Ansible to do this, because there is no way
to ensure Ansible runs at the appropriate point during the upgrade
process. As such, we need to be able to modify the target location as
part of the upgrade, without causing a conflict with Ansible the next
time it runs.
To that end, I've changed how the _wal-g-pg_ role creates the
configuration file for WAL-G. Instead of rendering directly to
`wal-g.yml`, the role renders a template, `wal-g.yml.in`. This template
can include a `@PGVERSION@` specifier. The `wal-g-config` script will
then use `sed` to replace that specifier with the version of PostgreSQL
installed on the server, rendering the final `wal-g.yml`. This script
is called both by Ansible in a handler after generating the template
configuration, and also as a post-upgrade action by the
`postgresql-upgrade` script.
I originally wanted the `wal-g-config` script to use the version of
PostgreSQL specified in the `PG_VERSION` file within the data directory.
This would ensure that WAL-G always uploads/downloads files for the
matching version. Unfortunately, this introduced a dependency conflict:
the WAL-G configuration needs to be present before a backup can be
restored, but the data directory is empty until after the backup has
been restored. Thus, we have to use the installed server version,
rather than the data directory version. This leaves a small window
where WAL-G may be configured to point to the wrong target if the
`postgresql-upgrade` script fails and thus does not trigger regenerating
the configuration file. This could result in new WAL archives/backups
being uploaded to the old target location. These files would be
incompatible with the other files in that location, and could
potentially overwrite existing files. This is rather unlikely, since
the PostgreSQL server will not start if the _postgresql-upgrade.service_
failed. The only time it should be possible is if the upgrade fails in
such a way that it leaves an empty but valid data directory, and then
the machine is rebooted.