The iSCSI initiator needs a unique name. It will generate one the first
time it starts if one does not already exist. Since it tries to write
it to a file under `/etc`, this will fail, since the root filesystem is
read-only. As such, we need to generate the name during installation,
when the filesystem is still writable.
Originally, I decided to use *btrfs* subvolumes to create writable
directories inside otherwise immutable locations, such as for
`/etc/cni/net.d`, etc. I figured this would be cleaner than
bind-mounting directories from `/var`, and would avoid the trouble of
determining an appropriate volume sizes necessary to make them each
their own filesystem.
Unfortunately, it turns out that *cri-o* may still have some issues with
its *btrfs* storage driver. One [blog post][0] hints at performance
issues in *containerd*, and it seems they may apply to *cri-o* as well.
I certainly encountered performance issues when attempting to run `npm`
in a Jenkins job running in a Kubernetes pod. There is definitely a
[performance issue with `npm`][1] when running in a container, which may
or may not have been exacerbated by the *btrfs* storage driver.
In any case, upstream [does not reecommend][2] using the *btrfs* driver,
performance notwithstanding. The *overlay* driver is much more widely
used and tested. Plus, it's easier to filter out container layers from
filesystem usage statistics simply by ignoring *overlay* filesystems.
[0]: https://blog.cubieserver.de/2022/dont-use-containerd-with-the-btrfs-snapshotter/
[1]: https://github.com/npm/cli/issues/3208#issuecomment-1002990902
[2]: https://github.com/containers/storage/issues/929
We're going to be using Longhorn for persistent storage. Longhorn
allocates space on worker nodes and exposes iSCSI LUNs to other worker
nodes. It creates sparse filesystem images under `/var/lib/longhorn`
for each volume. Thus, we need to mount a large filesystem at that
path on each worker node for Longhorn to use.
Using two different kickstart scripts, one for the control plane nodes,
and one for the worker nodes, we can properly mount the Longhorn data
directory only on machines that will be running the Longhorn manager.
Longhorn only supports *ext4* and *XFS* filesystem types.
* Correct example hostname
* Apply `base.yml` and `hostname.yml` separately, without
`bootstrap.yml`, to avoid deploying *firewalld*
* Correct host IP address
Kubernetes, or rather mostly Calico, does not play well on a machine
with an immutable root filesyste. Specifically, Calico needs write
access to a couple of paths on the root filesystem, such as
`/etc/cni/net.d`, `/opt/cni/bin`, and
`/usr/libexec/kubernetes/kubelet-plugins/volume`. Some of those paths
can be configured, but doing so is quite cumbersome. While these paths
could be made writable, e.g. using symlinks or bind mounts, it would add
a lot of complexity to the *kubelet* Ansible role. After considering
the options for a while, I decided that the best approach was probably
to mount specific filesystems at these paths. Instead of using small
LVM logical volumes for each one, I thought it would be better to use a
single *btrfs* filesystem for all the mutable storage locations. This
way, if I discover more paths that need to be writable, I can create
subvolumes for them, without having to try to move or resize the
existing volumes.
Now that the Kubernetes nodes need their own special kickstart file for
the disk layout, it also makes sense to handle the rest of the machine
setup there, too. This eliminates the need for the *kubelet* Ansible
role altogether. Any machine provisioned with this kickstart
configuration is immediately ready to become a Kubernetes control plane
or worker node.