When the SSH daemon is already configured to use an SSH host
certificate but the specified certificate file does not exist, then the
server will not try to use it later once it is created. This
essentially means that the certificate obtained during first boot will
not be used untile the SSH daemon is restarted.
Rather than try to set all of this up in the kickstart, it's probably
better to just let Ansible do it. Then, the SSH daemon can be restarted
as needed automatically (by the host provisioner).
To initiate the automatic host provisioning process, a new machine must
trigger the _POST /host/online_ webhook. Included in the request are
the hostname of the new machine and its SSH host public keys.
Optionally, the request can also contain the name of a branch in the
configuration policy repository. For virtual machines, this branch
name can be specified by a QEMU `fw_cfg` option. The `fw_cfg` values in
sysfs are only readable by root, so the service must run as root, but
it does not need any additional privileges, so we can use systemd
sandbox features to restrict it.
This feature is enabled by default for virtual machines. I haven't
quite figured out how to do the branch selection for physical machines
yet, but I will enable it for them once I do.
Delaying the _ssh-host-cert-sign@.service_ units starting until after
the clock is synchronized ends up causing _sshd.service_ to start way
before the host certififcates are available. This prevents the SSH
daemon from using the host certificates until it is explicitly reloaded,
so clients will not be able to verify the server's authenticity
automatically on first boot. To ensure that clients (read: Ansible)
will be able to connect to the server when it first boots without any
manual interaction, we need to delay the _sshd.service_ unit starting
until the certificate files are present.
I think this can actually happen to any server, not just a Raspberry Pi,
but it definitely always happens on Pis. I may eventually apply this
change to the `ssh-host-cert-sign@.service` template unit file in the
_sshca-cli-systemd_ package, if it turns out to be a more common
problem.
This will allow the `fedora-rpi-common.ks` kickstart fragment to be more
composeable, making it usable for systems other than "servers" that may
need a different disk layout.
Machines that use eMMC/SD cards for OS storage need a slightly different
disk layout than those with nVME drives. Notably, we do not want swap
or `/tmp` on the eMMC, as that will not really improve performance at
all and will be hard on the flash memory.
For NVMe, there are two options available, with and without a swap
volume.
On machines without an RTC, the clock will likely be very wrong on first
boot when system tries to obtain the initial SSH host certificates.
This results in the SSHCA server rejecting the request because the
authorization token has expired. To avoid this, we need to ensure the
clock is set before attempting to have the certificates signed.
Apparently something is populating `/etc/machine-id` at install time
now, which prevents units scheduled to run on first boot (with
`ConditionFirstBoot=true`) from starting.