projects: Add dynk8s page

2024-08-18 08:59:57 -05:00 · 2024-08-18 08:59:57 -05:00 · 62c4477478
parent 97a5cf4ac3
commit 62c4477478
4 changed files with 563 additions and 0 deletions
--- a/content/projects/dynk8s/cloudcontainer.jpg
+++ b/content/projects/dynk8s/cloudcontainer.jpg
--- a/content/projects/dynk8s/index.md
+++ b/content/projects/dynk8s/index.md
@ -0,0 +1,164 @@
+++
+title = "Dynamic Cloud Worker Nodes for On-Premises Kubernetes"
+description = """\
+Automatically launch EC2 instances as worker nodes in an on-premises Kubernetes
+cluster when they are needed, and remove them when they are not
+"""
+
+[extra]
+image = "projects/dynk8s/cloudcontainer.jpg"
+++
+
+One of the first things I wanted to do with my Kubernetes cluster at home was
+start using it for Jenkins jobs.  With the [Kubernetes][0] plugin, Jenkins can
+run create ephemeral Kubernetes pods to use as worker nodes to execute builds.
+Migrating all of my jobs to use this mechanism would allow me to get rid of the
+static agents running on VMs and Raspberry Pis.
+
+Getting the plugin installed and configured was relatively straightforward, and
+defining pod templates for CI pipelines was simple enough.  It did not take
+long to migrate the majority of the jobs that can run on x86_64 machines.  The
+aarch64, jobs, though, needed some more attention.
+
+It's no secret that Raspberry Pis are *slow*.  They are fine for very light
+use, or for dedicated single-application purposes, but trying to compile code,
+especially Rust, on one is a nightmare.  So, while I was redoing my Jenkins
+jobs, I took the opportunity to try to find a better, faster solution.
+
+Jenkins has an [Amazon EC2][1] plugin, which dynamically launches EC2 instances
+to execute builds and terminates them when they are no longer needed.  We use
+this plugin at work, and it is a decent solution.  I could configure Jenkins to
+launch Graviton instances to build aarch64 code.  Unfortunately, I would either
+need to pre-create AMIs with all of the necessary build dependencies and run
+the jobs directly on the worker nodes, or use the [Docker Pipeline][2] plugin
+to run them in Docker containers.  What I really wanted, though, was to be able
+to use Kubernetes for all of the jobs, so I set out to find a way to
+dynamically add cloud machines to my local Kubernetes cluster.
+
+The [Cluster Autoscaler][3] is a component for Kubernetes that integrates with
+cloud providers to automatically launch and terminate instances in response to
+demand in the Kubernetes cluster.  That is all it does, though; it does not
+integrate with the Kubernetes API to perform TLS bootstrapping or register the
+node in the cluster.  In the [Autoscaler FAQ][4], it hints at how to handle
+this limitation, though:
+
+> Example: If you use `kubeadm` to provision your cluster, it is up to you to
+> automatically execute `kubeadm join` at boot time via some script.
+
+With that in mind, I set out to build a solution that uses the Cluster
+Autoscaler, WireGuard, and `kubeadm` to automatically provision nodes in the
+cloud to run Jenkins jobs on pods created by the Jenkins Kubernetes plugin.
+
+[0]: https://plugins.jenkins.io/kubernetes
+[1]: https://plugins.jenkins.io/ec2
+[2]: https://plugins.jenkins.io/docker-workflow
+[3]: https://github.com/kubernetes/autoscaler
+[4]: https://github.com/kubernetes/autoscaler/blob/de560600991a5039fd9157b0eeeb39ec59247779/cluster-autoscaler/FAQ.md#how-does-scale-up-work
+
+
+## Process
+
+<div style="text-align: center;">
+
+[![Sequence Diagram](sequence.svg)](sequence.svg)
+
+</div>
+
+
+1. When Jenkins starts running a job that is configured to run in a Kubernetes
+   Pod, it uses the job's pod template to create the Pod resource.  It also
+   creates a worker node and waits for the JNLP agent in the pod to attach
+   itself to that node.
+2. Kubernetes attempts to schedule the pod Jenkins created.  If there is not a
+   node available, the scheduling fails.
+3. The Cluster Autoscaler detects that scheduling the pod failed.  It checks
+   the requirements for the pod, matches them to an EC2 Autoscaling Group, and
+   determines that scheduling would succeed if it increased the capacity of the
+   group.
+4. The Cluster Autoscaler increases the desired capacity of the EC2 Autoscaling
+   Group, launching a new EC2 instance.
+5. Amazon EventBridge sends a notification, via Amazon Simple Notification
+   Service, to the provisioning service, indicating that a new EC2 instance has
+   started.
+6. The provisioning service generates a `kubeadm` boostrap token for the new
+   instance and stores it as a Secret resource in Kubernetes.
+7. The provisioning service looks for an available Secret resource in
+   Kubernetes containing WireGuard configuration and marks it as assigned to
+   the new EC2 instance.
+8. The EC2 instance, via a script executed by *cloud-init*, fetches the
+   WireGuard configuration assigned to it from the provisioning service.
+9. The provisioning service searches for the Secret resource in Kubernetes
+   containing the WireGuard configuration assigned to the EC2 instance and
+   returns it in the HTTP response.
+10. The *cloud-init* script on the EC2 instance uses the returned WireGuard
+    configuration to configure a WireGuard interface and connect to the VPN.
+11. The *cloud-init* script on the EC2 instance generates a
+    [`JoinConfiguration`][7] document with cluster discovery configuration
+    pointing to the provisioning service and passes it to `kubeadm join`.
+12. The provisioning service looks up the Secret resource in Kubernetes
+    containing the bootstrap token assigned to the EC2 instance and generates a
+    *kubeconfig* file containing the cluster configuration information and that
+    token.  The *kubeconfig* file is returned in the HTTP response.
+13. `kubeadm join`, running on the EC2 instance communicates with the
+    Kubernetes API server, over the WireGuard tunnel, to perform TLS
+    bootstrapping and configure the Kubelet as a worker node in the cluster.
+14. When the Kubelet on the new EC2 instance is ready, Kubernetes detects that
+    the pod created by Jenkins can now be scheduled to run on it and instructs
+    the Kublet to start the containers in the pod.
+15. The Kublet on the new EC2 instance starts the pod's containers.  The JNLP
+    agent, running as one of the containers in the pod, connects to the Jenkins
+    controller.
+16. Jenkins assigns the job run to the new agent, which executes the job.
+
+[7]: https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration
+
+
+## Components
+
+### Jenkins Kubernetes Plugin
+
+The [Kubernetes plugin][0] for Jenkins is responsible for dynamically creating
+Kubernetes pods from templates associated with pipeline jobs.  Jobs provide a
+pod template that describe the containers and configuration they require in
+order to run.  Jenkins creates the corresponding resources using the Kubernetes
+API.
+
+### Autoscaler
+
+The [Cluster Autoscaler][3] is an optional Kubernetes component that integrates
+with cloud provider APIs to create or destroy worker nodes.  It does not handle
+any configuration on the machines themselves (i.e. running `kubeadm join`), but
+it does watch the cluster state and determine when to create or destroy new
+nodes based on pod requests.
+
+### cloud-init
+
+[cloud-init][5] is a tool that comes pre-installed on most cloud machine images
+(including the official Fedora AMIs) that can be used to automatically
+provision machines when they are first launched.  It can install packages,
+create configuration files, run commands, etc.
+
+[5]: https://cloud-init.io/
+
+### WireGuard
+
+[WireGuard][6] is a simple and high-performance VPN protocol.  It will provide
+the cloud instances with connectivity back to the private network, and
+therefore access to internal resources including the Kubernetes API.
+
+Unfortunately, WireGuard is not particularly amenable to "dynamic" clients
+(i.e. peers that come and go).  This means either custom tooling will be
+necessary to configure WireGuard peers on the fly OR pre-generating
+configuration for a set number of peers and ensuring that no more than that
+number of instances are every online simultaneously.
+
+[6]: https://www.wireguard.com/
+
+### Provisioning Service
+
+This is a custom piece of software that is responsible for provisioning
+secrets, etc. for the dynamic nodes.  Since it will be responsible for handing
+out WireGuard keys, it will have to be accessible directly over the Internet.
+It will have to authenticate requests somehow to ensure that they are from
+authorized clients (i.e. EC2 nodes created by the k8s Autoscaler) before
+generating any keys/tokens.
--- a/content/projects/dynk8s/sequence.plantuml
+++ b/content/projects/dynk8s/sequence.plantuml
@ -0,0 +1,36 @@
+@startuml
+box Internal Network
+participant Jenkins
+participant Pod
+participant Kubernetes
+participant Autoscaler
+participant Provisioner
+Jenkins -> Kubernetes : Create Pod
+Kubernetes -> Autoscaler : Scale Up
+end box
+Autoscaler -> AWS : Launch Instance
+create "EC2 Instance"
+AWS -> "EC2 Instance" : Start
+AWS --> Provisioner : Instance Started
+Provisioner -> Provisioner : Generate Bootstrap Token
+Provisioner -> Kubernetes : Store Bootstrap Token
+Provisioner -> Kubernetes : Allocate WireGuard Config
+"EC2 Instance" -> Provisioner : Request WireGuard Config
+Provisioner -> Kubernetes : Request WireGuard Config
+Kubernetes -> Provisioner : Return WireGuard Config
+Provisioner -> "EC2 Instance" : Return WireGuard Config
+"EC2 Instance" -> "EC2 Instance" : Configure WireGuard
+"EC2 Instance" -> Provisioner : Request Cluster Config
+Provisioner -> "EC2 Instance" : Return Cluster Config
+group WireGuard Tunnel
+"EC2 Instance" -> Kubernetes : Request Certificate
+Kubernetes -> "EC2 Instance" : Return Certificate
+"EC2 Instance" -> Kubernetes : Join Cluster
+Kubernetes -> "EC2 Instance" : Acknowledge Join
+Kubernetes -> "EC2 Instance" : Schedule Pod
+"EC2 Instance" -> Kubernetes : Pod Started
+end
+Kubernetes -> Jenkins : Pod Started
+create Pod
+Jenkins -> Pod : Execute job
+@enduml
--- a/content/projects/dynk8s/sequence.svg
+++ b/content/projects/dynk8s/sequence.svg