+++ title = 'Speed Up Jenkins Startup Time in Kubernetes' date = 2022-12-01T21:40:17-06:00 +++ I recently migrated my Jenkins server at home to run inside my Kubernetes cluster. I am very happy with it overall; upgrades are a lot simpler, and Longhorn volume snapshots make rolling back bad plugin updates a breeze. One issue that troubled me for a while, though, was that it took a *really* long time for the Jenkins server container to start. Kubernetes would list the pod in `ContainerCreating` state for several minutes, and then in `ContainerCreateError` for a while, before finally starting the process. It turns out this was because of the huge number of files in the Jenkins home directory. When the container starts up, the container runtime has to go through every file in the persistent volume and fix its permissions. My Jenkins instance has over 1.5 million files, so scanning and modifying them all takes a very long time. I was finally able to fix this issue today, after messing with it for a week or so. There are two changes the container runtime has to make to every file in the persistent volume: 1. The group ownership/GID 2. The SELinux label Fixing the first problem is straightforward: set `securityContext.fsGroupChangePolicy` on the pod or container to `OnRootMismatch`. The container runtime will check the GID of the root directory of the persistent volume, and if it is correct, skip checking any of the rest of the files and directories. The second problem was quite a bit trickier, but still fixable. It took me a bit longer to get the solution right, but with the help of a [cri-o GitHub issue][0], I finally managed. The key is to configure the container to have a static SELinux context; by default, the container runtime will assign a random category when the container starts. Naturally, this means the context labels of all the files in the persistent volume have to be changed every time, to match the new category. Fortunately, the `securityContext.seLinuxOptions.level` setting on the pod/container is available. I looked at the category of the Jenkins current process and set `level` to that: ```sh ps Z -p $(pgrep -f 'jenkins\.war') ``` ``` LABEL PID TTY STAT TIME COMMAND system_u:system_r:container_t:s0:c525,c600 196790 ? Sl 0:50 java -Duser.home=/var/jenkins_home -Djenkins.model.Jenkins.slaveAgentPort=50000 -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle -jar /usr/share/jenkins/jenkins.war ``` The *level* field is the final part of the process's label, after the last colon. ```yaml spec: containers: - securityContext: seLinuxOptions: level: s0:c525,c600 ``` With this setting in place, the container will start with the same SELinux context every time, so if the files are already labelled correctly, they do not have to be changed. Unfortunately, by default, CRI-O, still walks the whole directory tree to make sure. It can be configured to skip that step, though, similar to the `fsGroupChangePolicy`. The pod needs a special annotation: ```yaml metadata: annotations: io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: 'true' ``` CRI-O itself also has to be configured to respect that annotation. CRI-O's configuration is not well documented, but I was able to determine that these two lines need to be added to `/etc/crio/crio.conf`: ```toml [crio.runtime.runtimes.runc] allowed_annotations = ["io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel"] ``` In summary, there were four steps to configure the container runtime not to scan and touch every file in the persistent volume when starting the Jenkins container: 1. Set `securityContext.fsGroupChangePolicy` to `OnRootMismatch` 2. Set `securityContext.seLinuxOptions.level` to a static value 3. Add the `io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel` annotation 4. Configure CRI-O to respect said annotation After completing all four steps, the Jenkins container starts up in seconds instead of minutes. [0]: https://github.com/cri-o/cri-o/issues/6185