Version 0.2.0 of the HUD Controller is stateful. It requires writable
storage for its configuration file, as it updates the file when display
settings and screen URLs are changed.
While we're making changes, let's move it to its own namespace.
Kubernetes 1.24 introduced a new taint for Control Plane nodes that must
be tolerated in addition to the original taint in order for pods to be
scheduled to run on such nodes.
When cloning/fetching a Git repository in a Jenkins pipeline, the Git
Client plugin uses the configured *Host Key Verification Strategy* to
verify the SSH host key of the remote Git server. Unfortunately, there
does not seem to be any way to use the configured strategy from the
`git` command line in a Pipeline job, so e.g. `git push` does not
respect it. This causes jobs to fail to push changes to the remote if
the container they're using does not already have the SSH host key for
the remote in its known hosts database.
This commit adds a ConfigMap to the *jenkins-jobs* namespace that can be
mounted in containers to populate the SSH host key database.
I don't want Jenkins updating itself whenever the pod restarts, so I'm
going to pin it to a specific version. This way, I can be sure to take
a snapshot of the data volume before upgrading.
Setting a static SELinux level for the container allows CRI-O to skip
relabeling all the files in the persistent volume each time the
container starts. For this to work, the pod needs a special annotation,
and CRI-O itself has to be configured to respect it:
```toml
[crio.runtime.runtimes.runc]
allowed_annotations = ["io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel"]
```
This *dramatically* improves the start time of the Jenkins container.
Instead of taking 5+ minutes, it now starts instantly.
https://github.com/cri-o/cri-o/issues/6185#issuecomment-1334719982
Running Jenkins in Kubernetes is relatively straightforward. The
Kubernetes plugin automatically discovers all the connection and
authentication configuration, so a `kubeconfig` file is no longer
necessary. I did set the *Jenkins tunnel* option, though, so that
agents will connect directly to the Jenkins JNLP port instead of going
through the ingress controller.
Jobs now run in pods in the *jenkins-job* namespace instead of the
*jenkins* namespace. The latter is now where the Jenkins controller
runs, and the controller should not have permission to modify its own
resources.
I guess I thought `defaultBackend` was scoped to the TLS host, but it
appears to be global, across all Ingress resources in the cluster.
Thus, it really doesn't make any sense for any Ingress to have a default
backend, and certainly not the dynk8s provisioner.
Jenkins doesn't really need full control of all resources in its
namespace. Rather, it only needs to be able to manage Pod and
PersistentVolumeClaim resources.
Jenkins is now allowed to restart the Deployment named *kitchen* in the
*kitchen* namespace. It will do this after pushing a new container
image from a build of the *master* branch.
I decided to run the kitchen screen service in Kubernetes rather than on
the Raspberry Pi in the kitchen. This will hopefully make it a bit more
reliable and easier to update. It will also make it easier to rebuild
the OS on the Pi, if it ever becomes necessary, since it really only
needs Firefox (and MQTTDPMS) now.
By default, the Kubernetes metrics endpoints are restricted. I don't
think they're worth protecting with authentication, so I've added a
cluster role/binding to allow anonymous access to them.
I originally added the `du5t1n.me/storage` label to the x86_64 nodes and
configured Longhorn to only run on nodes with those labels because I
thought that was the correct way to control where volume replicas are
stored. It turns out that this was incorrect, as it prevented Longhorn
from running on non-matching nodes entirely. Thus, any machine that was
not so labeled could not access any Longhorn storage volumes.
The correct way to limit where Longhorn stores volume replicas is to
enable the `create-default-disk-labeled-nodes` setting. With this
setting enabled, Longhorn will run on all nodes, but will not create
"disks" on them unless they have the
`node.longhorn.io/create-default-disk` label set to `true`. Nodes that
do not have "disks" will not store volume replicas, but will run the
other Longhorn components and can therefore access Longhorn volumes.
Note that changing the "default settings" ConfigMap does not change the
setting once Longhorn has been deployed. To update the setting on an
existing installation, the setting has to be changed explicitly:
```sh
kubectl get setting -n longhorn-system -o json \
create-default-disk-labeled-nodes \
| jq '.value="true"' \
| kubectl apply -f -
```
The iSCSI initiator needs a unique name. It will generate one the first
time it starts if one does not already exist. Since it tries to write
it to a file under `/etc`, this will fail, since the root filesystem is
read-only. As such, we need to generate the name during installation,
when the filesystem is still writable.
Originally, I decided to use *btrfs* subvolumes to create writable
directories inside otherwise immutable locations, such as for
`/etc/cni/net.d`, etc. I figured this would be cleaner than
bind-mounting directories from `/var`, and would avoid the trouble of
determining an appropriate volume sizes necessary to make them each
their own filesystem.
Unfortunately, it turns out that *cri-o* may still have some issues with
its *btrfs* storage driver. One [blog post][0] hints at performance
issues in *containerd*, and it seems they may apply to *cri-o* as well.
I certainly encountered performance issues when attempting to run `npm`
in a Jenkins job running in a Kubernetes pod. There is definitely a
[performance issue with `npm`][1] when running in a container, which may
or may not have been exacerbated by the *btrfs* storage driver.
In any case, upstream [does not reecommend][2] using the *btrfs* driver,
performance notwithstanding. The *overlay* driver is much more widely
used and tested. Plus, it's easier to filter out container layers from
filesystem usage statistics simply by ignoring *overlay* filesystems.
[0]: https://blog.cubieserver.de/2022/dont-use-containerd-with-the-btrfs-snapshotter/
[1]: https://github.com/npm/cli/issues/3208#issuecomment-1002990902
[2]: https://github.com/containers/storage/issues/929
I was originally going to use GlusterFS to provide persistent storage
for pods, but [Heketi][0], the component that provides the API for
the Kubernetes StorageClass, is in "deep maintenance" status and looks
to be practically dead. I was a bit afraid to try to use it because of
that, and went looking for guidance on Reddit, which is how I discovered
Longhorn.
This manifest deploys the *ingress-nginx* controller, which is
responsible for handing traffic from clients outside the cluster and
routing it to the proper pods. I am using host network mode here to
avoid having to have another proxy in front of the ingress controller,
which would be required in NodePort mode.
I looked at MetalLB briefly, but decided to avoid it for now. As with
everything else in the Kubernetes world, it seems massively complex.
We're going to be using Longhorn for persistent storage. Longhorn
allocates space on worker nodes and exposes iSCSI LUNs to other worker
nodes. It creates sparse filesystem images under `/var/lib/longhorn`
for each volume. Thus, we need to mount a large filesystem at that
path on each worker node for Longhorn to use.
Using two different kickstart scripts, one for the control plane nodes,
and one for the worker nodes, we can properly mount the Longhorn data
directory only on machines that will be running the Longhorn manager.
Longhorn only supports *ext4* and *XFS* filesystem types.
* Correct example hostname
* Apply `base.yml` and `hostname.yml` separately, without
`bootstrap.yml`, to avoid deploying *firewalld*
* Correct host IP address
Kubernetes, or rather mostly Calico, does not play well on a machine
with an immutable root filesyste. Specifically, Calico needs write
access to a couple of paths on the root filesystem, such as
`/etc/cni/net.d`, `/opt/cni/bin`, and
`/usr/libexec/kubernetes/kubelet-plugins/volume`. Some of those paths
can be configured, but doing so is quite cumbersome. While these paths
could be made writable, e.g. using symlinks or bind mounts, it would add
a lot of complexity to the *kubelet* Ansible role. After considering
the options for a while, I decided that the best approach was probably
to mount specific filesystems at these paths. Instead of using small
LVM logical volumes for each one, I thought it would be better to use a
single *btrfs* filesystem for all the mutable storage locations. This
way, if I discover more paths that need to be writable, I can create
subvolumes for them, without having to try to move or resize the
existing volumes.
Now that the Kubernetes nodes need their own special kickstart file for
the disk layout, it also makes sense to handle the rest of the machine
setup there, too. This eliminates the need for the *kubelet* Ansible
role altogether. Any machine provisioned with this kickstart
configuration is immediately ready to become a Kubernetes control plane
or worker node.