Compare commits
2 Commits
5d4206e1a2
...
fab714379c
Author | SHA1 | Date |
---|---|---|
|
fab714379c | |
|
12c99fb5f0 |
|
@ -1,15 +0,0 @@
|
|||
FROM alpine
|
||||
|
||||
RUN echo jenkins:*:3000018:3000017::/var/lib/jenkins:/bin/bash >> /etc/passwd
|
||||
|
||||
RUN apk update && \
|
||||
apk add zola --repository http://dl-cdn.alpinelinux.org/alpine/edge/community/ && \
|
||||
apk add \
|
||||
openssh-client-default \
|
||||
python3 \
|
||||
py3-ruamel.yaml \
|
||||
rsync \
|
||||
&& \
|
||||
rm -rf /var/cache/apk/*
|
||||
|
||||
COPY ssh_known_hosts /etc/ssh/ssh_known_hosts
|
|
@ -2,9 +2,8 @@
|
|||
|
||||
pipeline {
|
||||
agent {
|
||||
dockerfile {
|
||||
dir 'ci'
|
||||
filename 'Containerfile'
|
||||
kubernetes {
|
||||
yamlFile 'ci/podTemplate.yaml'
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -12,22 +11,29 @@ pipeline {
|
|||
disableConcurrentBuilds()
|
||||
}
|
||||
|
||||
triggers {
|
||||
pollSCM ''
|
||||
environment {
|
||||
HOME = "${env.WORKSPACE}"
|
||||
}
|
||||
|
||||
stages {
|
||||
|
||||
stage('Build') {
|
||||
steps {
|
||||
sh '. ci/build.sh'
|
||||
container('zola') {
|
||||
sh 'zola build --base-url /'
|
||||
}
|
||||
container('python') {
|
||||
sh '. ci/build.sh'
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
stage('Publish') {
|
||||
steps {
|
||||
sshagent(['jenkins-web']) {
|
||||
sh '. ci/publish.sh'
|
||||
container('rsync') {
|
||||
sshagent(['jenkins-web']) {
|
||||
sh '. ci/publish.sh'
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
zola build --base-url /
|
||||
|
||||
python3 -m pip install --user ruamel.yaml
|
||||
python3 /dev/fd/3 < songquotes.yml > public/songquotes.json 3<<EOF
|
||||
from ruamel.yaml import safe_load as load
|
||||
from json import dump
|
||||
|
|
|
@ -0,0 +1,20 @@
|
|||
spec:
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
containers:
|
||||
- name: zola
|
||||
image: git.pyrocufflink.net/containerimages/zola
|
||||
- name: python
|
||||
image: docker.io/python:3.10
|
||||
command:
|
||||
- python
|
||||
args:
|
||||
- -c
|
||||
- import signal; signal.pause()
|
||||
- name: rsync
|
||||
image: git.pyrocufflink.net/containerimages/rsync
|
||||
command:
|
||||
- python3
|
||||
args:
|
||||
- -c
|
||||
- import signal; signal.pause()
|
|
@ -0,0 +1,96 @@
|
|||
+++
|
||||
title = 'Speed Up Jenkins Startup Time in Kubernetes'
|
||||
date = 2022-12-01T21:40:17-06:00
|
||||
+++
|
||||
|
||||
I recently migrated my Jenkins server at home to run inside my Kubernetes
|
||||
cluster. I am very happy with it overall; upgrades are a lot simpler, and
|
||||
Longhorn volume snapshots make rolling back bad plugin updates a breeze. One
|
||||
issue that troubled me for a while, though, was that it took a *really* long
|
||||
time for the Jenkins server container to start. Kubernetes would list the pod
|
||||
in `ContainerCreating` state for several minutes, and then in
|
||||
`ContainerCreateError` for a while, before finally starting the process. It
|
||||
turns out this was because of the huge number of files in the Jenkins home
|
||||
directory. When the container starts up, the container runtime has to go
|
||||
through every file in the persistent volume and fix its permissions. My
|
||||
Jenkins instance has over 1.5 million files, so scanning and modifying them all
|
||||
takes a very long time.
|
||||
|
||||
I was finally able to fix this issue today, after messing with it for a week or
|
||||
so. There are two changes the container runtime has to make to every file in
|
||||
the persistent volume:
|
||||
|
||||
1. The group ownership/GID
|
||||
2. The SELinux label
|
||||
|
||||
Fixing the first problem is straightforward: set
|
||||
`securityContext.fsGroupChangePolicy` on the pod or container to
|
||||
`OnRootMismatch`. The container runtime will check the GID of the root
|
||||
directory of the persistent volume, and if it is correct, skip checking any of
|
||||
the rest of the files and directories.
|
||||
|
||||
The second problem was quite a bit trickier, but still fixable. It took me a
|
||||
bit longer to get the solution right, but with the help of a [cri-o GitHub
|
||||
issue][0], I finally managed. The key is to configure the container to have a
|
||||
static SELinux context; by default, the container runtime will assign a random
|
||||
category when the container starts. Naturally, this means the context labels
|
||||
of all the files in the persistent volume have to be changed every time, to
|
||||
match the new category. Fortunately, the
|
||||
`securityContext.seLinuxOptions.level` setting on the pod/container is
|
||||
available. I looked at the category of the Jenkins current process and set
|
||||
`level` to that:
|
||||
|
||||
```sh
|
||||
ps Z -p $(pgrep -f 'jenkins\.war')
|
||||
```
|
||||
|
||||
```
|
||||
LABEL PID TTY STAT TIME COMMAND
|
||||
system_u:system_r:container_t:s0:c525,c600 196790 ? Sl 0:50 java -Duser.home=/var/jenkins_home -Djenkins.model.Jenkins.slaveAgentPort=50000 -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle -jar /usr/share/jenkins/jenkins.war
|
||||
```
|
||||
|
||||
The *level* field is the final two parts of the process's label and includes
|
||||
the context's category.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
containers:
|
||||
- securityContext:
|
||||
seLinuxOptions:
|
||||
level: s0:c525,c600
|
||||
```
|
||||
|
||||
With this setting in place, the container will start with the same SELinux
|
||||
context every time, so if the files are already labelled correctly, they do not
|
||||
have to be changed. Unfortunately, by default, CRI-O, still walks the whole
|
||||
directory tree to make sure. It can be configured to skip that step, though,
|
||||
similar to the `fsGroupChangePolicy`. The pod needs a special annotation:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: 'true'
|
||||
```
|
||||
|
||||
CRI-O itself also has to be configured to respect that annotation. CRI-O's
|
||||
configuration is not well documented, but I was able to determine that these
|
||||
two lines need to be added to `/etc/crio/crio.conf`:
|
||||
|
||||
```toml
|
||||
[crio.runtime.runtimes.runc]
|
||||
allowed_annotations = ["io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel"]
|
||||
```
|
||||
|
||||
In summary, there were four steps to configure the container runtime not to
|
||||
scan and touch every file in the persistent volume when starting the Jenkins
|
||||
container:
|
||||
|
||||
1. Set `securityContext.fsGroupChangePolicy` to `OnRootMismatch`
|
||||
2. Set `securityContext.seLinuxOptions.level` to a static value
|
||||
3. Add the `io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel` annotation
|
||||
4. Configure CRI-O to respect said annotation
|
||||
|
||||
After completing all four steps, the Jenkins container starts up in seconds
|
||||
instead of minutes.
|
||||
|
||||
[0]: https://github.com/cri-o/cri-o/issues/6185
|
Loading…
Reference in New Issue