blog: Speed Up Jenkins Startup Time in Kubernetes
parent
c08ff6c268
commit
12c99fb5f0
|
@ -0,0 +1,96 @@
|
|||
+++
|
||||
title = 'Speed Up Jenkins Startup Time in Kubernetes'
|
||||
date = 2022-12-01T21:40:17-06:00
|
||||
+++
|
||||
|
||||
I recently migrated my Jenkins server at home to run inside my Kubernetes
|
||||
cluster. I am very happy with it overall; upgrades are a lot simpler, and
|
||||
Longhorn volume snapshots make rolling back bad plugin updates a breeze. One
|
||||
issue that troubled me for a while, though, was that it took a *really* long
|
||||
time for the Jenkins server container to start. Kubernetes would list the pod
|
||||
in `ContainerCreating` state for several minutes, and then in
|
||||
`ContainerCreateError` for a while, before finally starting the process. It
|
||||
turns out this was because of the huge number of files in the Jenkins home
|
||||
directory. When the container starts up, the container runtime has to go
|
||||
through every file in the persistent volume and fix its permissions. My
|
||||
Jenkins instance has over 1.5 million files, so scanning and modifying them all
|
||||
takes a very long time.
|
||||
|
||||
I was finally able to fix this issue today, after messing with it for a week or
|
||||
so. There are two changes the container runtime has to make to every file in
|
||||
the persistent volume:
|
||||
|
||||
1. The group ownership/GID
|
||||
2. The SELinux label
|
||||
|
||||
Fixing the first problem is straightforward: set
|
||||
`securityContext.fsGroupChangePolicy` on the pod or container to
|
||||
`OnRootMismatch`. The container runtime will check the GID of the root
|
||||
directory of the persistent volume, and if it is correct, skip checking any of
|
||||
the rest of the files and directories.
|
||||
|
||||
The second problem was quite a bit trickier, but still fixable. It took me a
|
||||
bit longer to get the solution right, but with the help of a [cri-o GitHub
|
||||
issue][0], I finally managed. The key is to configure the container to have a
|
||||
static SELinux context; by default, the container runtime will assign a random
|
||||
category when the container starts. Naturally, this means the context labels
|
||||
of all the files in the persistent volume have to be changed every time, to
|
||||
match the new category. Fortunately, the
|
||||
`securityContext.seLinuxOptions.level` setting on the pod/container is
|
||||
available. I looked at the category of the Jenkins current process and set
|
||||
`level` to that:
|
||||
|
||||
```sh
|
||||
ps Z -p $(pgrep -f 'jenkins\.war')
|
||||
```
|
||||
|
||||
```
|
||||
LABEL PID TTY STAT TIME COMMAND
|
||||
system_u:system_r:container_t:s0:c525,c600 196790 ? Sl 0:50 java -Duser.home=/var/jenkins_home -Djenkins.model.Jenkins.slaveAgentPort=50000 -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle -jar /usr/share/jenkins/jenkins.war
|
||||
```
|
||||
|
||||
The *level* field is the final two parts of the process's label and includes
|
||||
the context's category.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
containers:
|
||||
- securityContext:
|
||||
seLinuxOptions:
|
||||
level: s0:c525,c600
|
||||
```
|
||||
|
||||
With this setting in place, the container will start with the same SELinux
|
||||
context every time, so if the files are already labelled correctly, they do not
|
||||
have to be changed. Unfortunately, by default, CRI-O, still walks the whole
|
||||
directory tree to make sure. It can be configured to skip that step, though,
|
||||
similar to the `fsGroupChangePolicy`. The pod needs a special annotation:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: 'true'
|
||||
```
|
||||
|
||||
CRI-O itself also has to be configured to respect that annotation. CRI-O's
|
||||
configuration is not well documented, but I was able to determine that these
|
||||
two lines need to be added to `/etc/crio/crio.conf`:
|
||||
|
||||
```toml
|
||||
[crio.runtime.runtimes.runc]
|
||||
allowed_annotations = ["io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel"]
|
||||
```
|
||||
|
||||
In summary, there were four steps to configure the container runtime not to
|
||||
scan and touch every file in the persistent volume when starting the Jenkins
|
||||
container:
|
||||
|
||||
1. Set `securityContext.fsGroupChangePolicy` to `OnRootMismatch`
|
||||
2. Set `securityContext.seLinuxOptions.level` to a static value
|
||||
3. Add the `io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel` annotation
|
||||
4. Configure CRI-O to respect said annotation
|
||||
|
||||
After completing all four steps, the Jenkins container starts up in seconds
|
||||
instead of minutes.
|
||||
|
||||
[0]: https://github.com/cri-o/cri-o/issues/6185
|
Loading…
Reference in New Issue