projects: Add dynk8s page
parent
97a5cf4ac3
commit
62c4477478
Binary file not shown.
After Width: | Height: | Size: 33 KiB |
|
@ -0,0 +1,164 @@
|
|||
+++
|
||||
title = "Dynamic Cloud Worker Nodes for On-Premises Kubernetes"
|
||||
description = """\
|
||||
Automatically launch EC2 instances as worker nodes in an on-premises Kubernetes
|
||||
cluster when they are needed, and remove them when they are not
|
||||
"""
|
||||
|
||||
[extra]
|
||||
image = "projects/dynk8s/cloudcontainer.jpg"
|
||||
+++
|
||||
|
||||
One of the first things I wanted to do with my Kubernetes cluster at home was
|
||||
start using it for Jenkins jobs. With the [Kubernetes][0] plugin, Jenkins can
|
||||
run create ephemeral Kubernetes pods to use as worker nodes to execute builds.
|
||||
Migrating all of my jobs to use this mechanism would allow me to get rid of the
|
||||
static agents running on VMs and Raspberry Pis.
|
||||
|
||||
Getting the plugin installed and configured was relatively straightforward, and
|
||||
defining pod templates for CI pipelines was simple enough. It did not take
|
||||
long to migrate the majority of the jobs that can run on x86_64 machines. The
|
||||
aarch64, jobs, though, needed some more attention.
|
||||
|
||||
It's no secret that Raspberry Pis are *slow*. They are fine for very light
|
||||
use, or for dedicated single-application purposes, but trying to compile code,
|
||||
especially Rust, on one is a nightmare. So, while I was redoing my Jenkins
|
||||
jobs, I took the opportunity to try to find a better, faster solution.
|
||||
|
||||
Jenkins has an [Amazon EC2][1] plugin, which dynamically launches EC2 instances
|
||||
to execute builds and terminates them when they are no longer needed. We use
|
||||
this plugin at work, and it is a decent solution. I could configure Jenkins to
|
||||
launch Graviton instances to build aarch64 code. Unfortunately, I would either
|
||||
need to pre-create AMIs with all of the necessary build dependencies and run
|
||||
the jobs directly on the worker nodes, or use the [Docker Pipeline][2] plugin
|
||||
to run them in Docker containers. What I really wanted, though, was to be able
|
||||
to use Kubernetes for all of the jobs, so I set out to find a way to
|
||||
dynamically add cloud machines to my local Kubernetes cluster.
|
||||
|
||||
The [Cluster Autoscaler][3] is a component for Kubernetes that integrates with
|
||||
cloud providers to automatically launch and terminate instances in response to
|
||||
demand in the Kubernetes cluster. That is all it does, though; it does not
|
||||
integrate with the Kubernetes API to perform TLS bootstrapping or register the
|
||||
node in the cluster. In the [Autoscaler FAQ][4], it hints at how to handle
|
||||
this limitation, though:
|
||||
|
||||
> Example: If you use `kubeadm` to provision your cluster, it is up to you to
|
||||
> automatically execute `kubeadm join` at boot time via some script.
|
||||
|
||||
With that in mind, I set out to build a solution that uses the Cluster
|
||||
Autoscaler, WireGuard, and `kubeadm` to automatically provision nodes in the
|
||||
cloud to run Jenkins jobs on pods created by the Jenkins Kubernetes plugin.
|
||||
|
||||
[0]: https://plugins.jenkins.io/kubernetes
|
||||
[1]: https://plugins.jenkins.io/ec2
|
||||
[2]: https://plugins.jenkins.io/docker-workflow
|
||||
[3]: https://github.com/kubernetes/autoscaler
|
||||
[4]: https://github.com/kubernetes/autoscaler/blob/de560600991a5039fd9157b0eeeb39ec59247779/cluster-autoscaler/FAQ.md#how-does-scale-up-work
|
||||
|
||||
|
||||
## Process
|
||||
|
||||
<div style="text-align: center;">
|
||||
|
||||
[](sequence.svg)
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
1. When Jenkins starts running a job that is configured to run in a Kubernetes
|
||||
Pod, it uses the job's pod template to create the Pod resource. It also
|
||||
creates a worker node and waits for the JNLP agent in the pod to attach
|
||||
itself to that node.
|
||||
2. Kubernetes attempts to schedule the pod Jenkins created. If there is not a
|
||||
node available, the scheduling fails.
|
||||
3. The Cluster Autoscaler detects that scheduling the pod failed. It checks
|
||||
the requirements for the pod, matches them to an EC2 Autoscaling Group, and
|
||||
determines that scheduling would succeed if it increased the capacity of the
|
||||
group.
|
||||
4. The Cluster Autoscaler increases the desired capacity of the EC2 Autoscaling
|
||||
Group, launching a new EC2 instance.
|
||||
5. Amazon EventBridge sends a notification, via Amazon Simple Notification
|
||||
Service, to the provisioning service, indicating that a new EC2 instance has
|
||||
started.
|
||||
6. The provisioning service generates a `kubeadm` boostrap token for the new
|
||||
instance and stores it as a Secret resource in Kubernetes.
|
||||
7. The provisioning service looks for an available Secret resource in
|
||||
Kubernetes containing WireGuard configuration and marks it as assigned to
|
||||
the new EC2 instance.
|
||||
8. The EC2 instance, via a script executed by *cloud-init*, fetches the
|
||||
WireGuard configuration assigned to it from the provisioning service.
|
||||
9. The provisioning service searches for the Secret resource in Kubernetes
|
||||
containing the WireGuard configuration assigned to the EC2 instance and
|
||||
returns it in the HTTP response.
|
||||
10. The *cloud-init* script on the EC2 instance uses the returned WireGuard
|
||||
configuration to configure a WireGuard interface and connect to the VPN.
|
||||
11. The *cloud-init* script on the EC2 instance generates a
|
||||
[`JoinConfiguration`][7] document with cluster discovery configuration
|
||||
pointing to the provisioning service and passes it to `kubeadm join`.
|
||||
12. The provisioning service looks up the Secret resource in Kubernetes
|
||||
containing the bootstrap token assigned to the EC2 instance and generates a
|
||||
*kubeconfig* file containing the cluster configuration information and that
|
||||
token. The *kubeconfig* file is returned in the HTTP response.
|
||||
13. `kubeadm join`, running on the EC2 instance communicates with the
|
||||
Kubernetes API server, over the WireGuard tunnel, to perform TLS
|
||||
bootstrapping and configure the Kubelet as a worker node in the cluster.
|
||||
14. When the Kubelet on the new EC2 instance is ready, Kubernetes detects that
|
||||
the pod created by Jenkins can now be scheduled to run on it and instructs
|
||||
the Kublet to start the containers in the pod.
|
||||
15. The Kublet on the new EC2 instance starts the pod's containers. The JNLP
|
||||
agent, running as one of the containers in the pod, connects to the Jenkins
|
||||
controller.
|
||||
16. Jenkins assigns the job run to the new agent, which executes the job.
|
||||
|
||||
[7]: https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration
|
||||
|
||||
|
||||
## Components
|
||||
|
||||
### Jenkins Kubernetes Plugin
|
||||
|
||||
The [Kubernetes plugin][0] for Jenkins is responsible for dynamically creating
|
||||
Kubernetes pods from templates associated with pipeline jobs. Jobs provide a
|
||||
pod template that describe the containers and configuration they require in
|
||||
order to run. Jenkins creates the corresponding resources using the Kubernetes
|
||||
API.
|
||||
|
||||
### Autoscaler
|
||||
|
||||
The [Cluster Autoscaler][3] is an optional Kubernetes component that integrates
|
||||
with cloud provider APIs to create or destroy worker nodes. It does not handle
|
||||
any configuration on the machines themselves (i.e. running `kubeadm join`), but
|
||||
it does watch the cluster state and determine when to create or destroy new
|
||||
nodes based on pod requests.
|
||||
|
||||
### cloud-init
|
||||
|
||||
[cloud-init][5] is a tool that comes pre-installed on most cloud machine images
|
||||
(including the official Fedora AMIs) that can be used to automatically
|
||||
provision machines when they are first launched. It can install packages,
|
||||
create configuration files, run commands, etc.
|
||||
|
||||
[5]: https://cloud-init.io/
|
||||
|
||||
### WireGuard
|
||||
|
||||
[WireGuard][6] is a simple and high-performance VPN protocol. It will provide
|
||||
the cloud instances with connectivity back to the private network, and
|
||||
therefore access to internal resources including the Kubernetes API.
|
||||
|
||||
Unfortunately, WireGuard is not particularly amenable to "dynamic" clients
|
||||
(i.e. peers that come and go). This means either custom tooling will be
|
||||
necessary to configure WireGuard peers on the fly OR pre-generating
|
||||
configuration for a set number of peers and ensuring that no more than that
|
||||
number of instances are every online simultaneously.
|
||||
|
||||
[6]: https://www.wireguard.com/
|
||||
|
||||
### Provisioning Service
|
||||
|
||||
This is a custom piece of software that is responsible for provisioning
|
||||
secrets, etc. for the dynamic nodes. Since it will be responsible for handing
|
||||
out WireGuard keys, it will have to be accessible directly over the Internet.
|
||||
It will have to authenticate requests somehow to ensure that they are from
|
||||
authorized clients (i.e. EC2 nodes created by the k8s Autoscaler) before
|
||||
generating any keys/tokens.
|
|
@ -0,0 +1,36 @@
|
|||
@startuml
|
||||
box Internal Network
|
||||
participant Jenkins
|
||||
participant Pod
|
||||
participant Kubernetes
|
||||
participant Autoscaler
|
||||
participant Provisioner
|
||||
Jenkins -> Kubernetes : Create Pod
|
||||
Kubernetes -> Autoscaler : Scale Up
|
||||
end box
|
||||
Autoscaler -> AWS : Launch Instance
|
||||
create "EC2 Instance"
|
||||
AWS -> "EC2 Instance" : Start
|
||||
AWS --> Provisioner : Instance Started
|
||||
Provisioner -> Provisioner : Generate Bootstrap Token
|
||||
Provisioner -> Kubernetes : Store Bootstrap Token
|
||||
Provisioner -> Kubernetes : Allocate WireGuard Config
|
||||
"EC2 Instance" -> Provisioner : Request WireGuard Config
|
||||
Provisioner -> Kubernetes : Request WireGuard Config
|
||||
Kubernetes -> Provisioner : Return WireGuard Config
|
||||
Provisioner -> "EC2 Instance" : Return WireGuard Config
|
||||
"EC2 Instance" -> "EC2 Instance" : Configure WireGuard
|
||||
"EC2 Instance" -> Provisioner : Request Cluster Config
|
||||
Provisioner -> "EC2 Instance" : Return Cluster Config
|
||||
group WireGuard Tunnel
|
||||
"EC2 Instance" -> Kubernetes : Request Certificate
|
||||
Kubernetes -> "EC2 Instance" : Return Certificate
|
||||
"EC2 Instance" -> Kubernetes : Join Cluster
|
||||
Kubernetes -> "EC2 Instance" : Acknowledge Join
|
||||
Kubernetes -> "EC2 Instance" : Schedule Pod
|
||||
"EC2 Instance" -> Kubernetes : Pod Started
|
||||
end
|
||||
Kubernetes -> Jenkins : Pod Started
|
||||
create Pod
|
||||
Jenkins -> Pod : Execute job
|
||||
@enduml
|
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 22 KiB |
Loading…
Reference in New Issue