Commit Graph

38 Commits (master)

Author SHA1 Message Date
Dustin ff67ddf8bf tf/asg: Update node template to Fedora 41
dustin/dynk8s-provisioner/pipeline/head There was a failure building this commit Details
2025-07-05 11:06:32 -05:00
Dustin e30b03dad5 tf/userdata: Remove all CNI conflist files
CRI-O now installs more `.conflist` files in `/etc/cni/net.d`.  Their
presence interferes with Calico, so they need to be deleted in order to
have fully working Pod networking, especially for pods that start very
early (before Calico is completely ready).
2024-11-19 11:55:29 -06:00
Dustin dbcda4a8ca tf/userdata: Configure CRI-O to use crun
dustin/dynk8s-provisioner/pipeline/head There was a failure building this commit Details
By default, CRI-O uses `runc` as the container runtime.  `runc` does not
support user namespaces, though, so we have to use `crun`, which does.
2024-11-03 12:34:40 -06:00
Dustin f531b03e7c tf/userdata: Use IMDSv2 tokens
The Fedora 40 AMIs require IMDSv2.  Our `kubeadm-join` script therefore
needs to fetch the auth token and include it with metada requests.
2024-11-03 12:31:27 -06:00
Dustin 0ec109b088 tf/asg: Update to Fedora 40
Upstream changed the naming convention for Fedora AMIs.  It also seems
they've stopped publishing "release" artifacts; all the AMIs are now
date-stamped.  We should probably consider running `terraform apply`
periodically to keep up-to-date.
2024-11-03 12:31:11 -06:00
Dustin c63c4d9e8c tf/userdata: Taint node for Jenkins only
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
If a Jenkins job runs for a while, Kubernetes may schedule other Pods
on it eventually.  If a long-running Pod gets assigned to the ephemeral
node, the Cluster Autoscaler won't be able to scale down the ASG.  To
prevent this, we apply a taint to the node so normal Pods will not get
assigned to it.  We have to apply the corresponding toleration to Pods
for Jenkins jobs.
2024-02-13 07:52:54 -06:00
Dustin 925d22b9d2 tf/userdata: Provision instance storage
The *c7gd.xlarge* instance type has a directly-attached NVMe disk.
Let's use it for Kubernetes Pod storage to increase performance a bit.
2024-02-13 07:50:43 -06:00
Dustin 6f279430c2 tf/asg: Use larger instance type
I'd rather spend a few extra pennies on beefier ephemeral worker nodes
to speed up builds.
2024-02-13 07:41:05 -06:00
Dustin 3c4f84e039 tf/userdata: Remove default CRI-O CNI config
Fedora AMIs have the default locale set to en_US.UTF-8, which sorts
`100-crio-bridge.conflist` before `10-calico.conflist`.  As a result,
Pods end up with incorrect network configuration, and cannot be reached
from other Pods on the container network.  Since we do not need the
default configuration, the easiest way to resolve this is to just delete
it.
2024-02-05 20:58:31 -06:00
Dustin c4f73073dc tf/asg: Increase root block device size
The default root block device for Fedora EC2 instances is only 10 GiB.
This is insufficient for many jobs, especially those that build large
container images.
2024-02-05 20:53:38 -06:00
Dustin f6910f04df tf/asg: Add CA resource tag for FUSE device plugin
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
Jenkins jobs that build container images in user namespaces need access
to `/dev/fuse`, which is provided by the [fuse-device-plugin][0].  This
plugin runs as a DaemonSet, which updates the status of the node it's
running on when it starts to indicate that the FUSE device is available.
When scaling up from zero nodes, Cluster Autoscaler has no way to know
that this will occur, and therefore cannot determine that scaling up the
ASG will create a node with the required resources.  Thus, the ASG needs
a tag to inform CA that the nodes it creates will indeed have the
resources and scaling it up will allow the pod to be scheduled.

Although this feature of CA was added in 1.14, it apparently got broken
at some point and no longer works in 1.22.  It works again in 1.26,
though.

[0]: https://github.com/kuberenetes-learning-group/fuse-device-plugin/tree/master
2024-01-14 11:42:46 -06:00
Dustin 5a79680b22 tf/userdata: Install CRI-O from Fedora base
The *cri-o* package has moved from its own module into the base Fedora
repository, as Fedora is [eliminating modules][0].  The last modular
version was 1.25, which is too old to run pods with user namespaces.
Version 1.26 is available in the base repository, which does support
user namespaces.

[0]: https://fedoraproject.org/wiki/Changes/RetireModularity
2024-01-13 10:10:46 -06:00
Dustin 02772f17dd tf/asg: Look up Fedora AMI by attributes
Instead of hard-coding the AMI ID of the Fedora build we want, we can
use the `aws_ami` data source to search for it.  The Fedora release team
has a consistent naming scheme for AMIs, so finding the correct one is
straightforward.
2023-11-13 20:27:50 -06:00
Dustin 473e279a18 tf/userdata: Remove default DNS configuration
Lately, cloud nodes seem to be failing to come up more frequently.  I
traced this down to the fact that `/etc/resolv.conf` in the `kube-proxy`
container contains both the AWS-provided DNS server and the on-premises
server set by Wireguard.  This evidently "works" correctly sometimes,
but not always.  When it doesn't, the `kube-proxy` cannot resolve the
Kubernetes API server address, and thus cannot create the necessary
netfilter rules to forward traffic correctly.  This causes pods to be
unable to communicate.

I am not entirely sure what the "correct" solution to this problem would
be, since there are various issues in play here.  Fortunately, cloud
nodes are only ever around for a short time, and never need to be
rebooted.  As such, we can use a "quick fix" and simply remove the
AWS-provided DNS configuration.
2023-11-13 19:52:57 -06:00
Dustin 4a2a376409 terraform: Update node template to Fedora 38 2023-11-13 19:52:47 -06:00
Dustin 83b8c4a7cc userdata: Set kubelet config path
The default configuration for the *kubelet.service* unit does not
specify the path to the `config.yml` generated by `kubeadm`.  Thus, any
settings defined in the `kublet-config` ConfigMap do not take effect.
To resolve this, we have to explicitly set the path in the `config`
property of the `kubeletExtraArgs` object in the join configuration.
2023-11-13 19:49:32 -06:00
Dustin c4cabfcdbc terraform: Update node template to Fedora 37
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
2023-06-11 20:22:44 -05:00
Dustin 2f0f134223 terraform: userdata: Add Longhorn issue workaround
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
There's apparently a bug in open-iscsi (see
[issue #4988](https://github.com/longhorn/longhorn/issues/4988)) that
prevents Longhorn from working on Fedora 36+.  We need a SELinux policy
patch to work around it.
2023-01-10 21:09:46 -06:00
Dustin b01841ab72 terraform: Update node template to Fedora 36
dustin/dynk8s-provisioner/pipeline/head Something is wrong with the build of this commit Details
2023-01-10 17:19:20 -06:00
Dustin 37cbcba662 examples: Add Kubernetes manifest
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
The `dynk8s-provisioner.yaml` file contains an example of how to deploy
the *dynk8s-provisioner* in Kubernetes using `kubectl`.
2022-10-11 21:52:05 -05:00
Dustin e11f98b430 terraform: Add config for auto-scaling group
The Cluser Autoscaler uses EC2 Auto-Scaling Groups to configure the
instances it launches when it determines additional worker nodes are
necessary.  Auto-Scaling Groups have an associated Launch Template,
which describes the properties of the instances, such as AMI ID,
instance type, security groups, etc.

When instances are first launched, they need to be configured to join
the on-premises Kubernetes cluster.  This is handled by *cloud-init*
using the configuration in the instance user data.  The configuration
supplied here specifies the Fedora packages that need to be installed on
a Kubernetes worker node, plus some additional configuration required by
`kubeadm`, `kubelet`, and/or `cri-o`.  It also includes a script that
fetches the WireGuard client configuration and connects to the VPN,
finalizes the setup process, and joins the cluster.
2022-10-11 21:40:42 -05:00
Dustin c48076b8f0 test: Adjust k8s roles for integration tests
Initially, I thought it was necessary to use a ClusterRole in order to
assign permissions in one namespace to a service account in another.  It
turns out, this is not necessary, as RoleBinding rules can refer to
subjects in any namespace.  Thus, we can limit the privileges of the
*dynk8s-provisioner* service account by only allowing it access to the
Secret and ConfigMap resources in the *kube-system* and *kube-public*
namespaces, respectively, plus the Secret resources in its own
namespace.
2022-10-11 21:08:49 -05:00
Dustin cd920418aa events: Delete Node on instance termination
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
The Cluster Autoscaler does not delete the Node resource in Kubernetes
after it terminates an instance:

> It does not delete the Node object from Kubernetes. Cleaning up Node
> objects corresponding to terminated instances is the responsibility of
> the cloud node controller, which can run as part of
> kube-controller-manager or cloud-controller-manager.

On-premises clusters are probably not running the Cloud Controller
Manager, so Node resources are liable to be left behind after a
scale-down event.

To keep unused Node resources from accumulating, the
*dynk8s-provisioner* will now delete the Node resource associated with
an EC2 instance when it receives a state-change event indicating the
instance has been terminated.  To identify the correct Node, it compares
the value of the `providerID` field of each existing node with the
instance ID mentioned in the event.  An exact match is not possible,
since the provider ID includes the availability zone of the instance,
which is not included in the event, however, instances IDs are unique
enough that this "should" never be an issue.
2022-10-11 20:00:24 -05:00
Dustin d85f314a8b tests: Begin integration tests
dustin/dynk8s-provisioner/pipeline/head There was a failure building this commit Details
Cargo uses the sources in the `tests` directory to build and run
integration tests.  For each `tests/foo.rs` or `tests/foo/main.rs`, it
creates an executable that runs the test functions therein.  These
executables are separate crates from the main package, and thus do not
have access to its private members.  Integration tests are expected to
test only the public functionality of the package.

Application crates do not have any public members; their public
interface is the command line.  Integration tests would typically run
the command (e.g. using `std::process::Command`) and test its output.

Since *dynk8s-provisioner* is not really a command-line tool, testing it
this way would be difficult; each test would need to start the server,
make requests to it, and then stop it.  This would be slow and
cumbersome.

In order to avoid this tedium and be able to use Rocket's built-in test
client, I have converted *dynk8s-provisioner* into a library crate that
also includes an executable.  The library makes the `rocket` function
public, which allows the integration tests to import it and pass it to
the Rocket test client.

The point of integration tests, of course, is to validate the
functionality of the application as a whole.  This necessarily requires
allowing it to communicate with the Kubernetes API.  In the Jenkins CI
environment, the application will need the appropriate credentials, and
will need to use a separate Kubernetes namespace from the production
deployment.  The `setup.yaml` manifest in the `tests` directory defines
the resources necessary to run integration tests, and the
`genkubeconfig.sh` script can be used to create the appropriate
kubeconfig file containing the credentials.  The kubeconfig is exposed
to the tests via the `KUBECONFIG` environment variable, which is
populated from a Jenkins secret file credential.

Note: The `data` directory moved from `test` to `tests` to avoid
duplication and confusing names.
2022-10-07 07:37:20 -05:00
Dustin 3e3904cd4f events: Delete bootstrap tokens on termination
When an instance is terminated, any bootstrap tokens assigned to it are
now deleted.  Though these would expire anyway, deleting them ensures
that they cannot be used again if they happened to be leaked while the
instance was running.  Further, it ensures that attempting to fetch the
`kubeadm` configuration for the instance will return an HTTP 404 Not
Found response once the instance has terminated.
2022-10-07 06:52:06 -05:00
Dustin df39fe46eb routes: Add kubeadm kubeconfig resource
The *GET /kubeadm/kubeconfig/<instance-id>* operation returns a
configuration document for `kubeadm` to add the node to the cluster as a
worker.  The document is derived from the kubeconfig stored in the
`cluster-info` ConfigMap, which includes the external URL of the
Kubernetes API server and the root CA certificate used in the cluster.
The bootstrap token assigned to the specified instance is added to the
document for `kubeadm` to use for authentication.  The kubeconfig is
stored in the ConfigMap as a string, so extracting data from it requires
deserializing the YAML document first.

In order to access the cluster information ConfigMap, the service
account bound to the pod running the provisioner service must have the
appropriate permissions.
2022-10-07 06:52:06 -05:00
Dustin 25524d5290 routes: Add WireGuard configuration resource
The * GET /wireguard/config/<instance-id>* resource returns the
WireGuard client configuration assigned to the specified instance ID.
The resource contents are stored in the Kubernetes Secret, in a data
field named `wireguard-config`.  The contents of this field are returned
directly as a string, without any transformation.  Thus, the value must
be a complete, valid WireGuard configuration document.  Instances will
fetch and save this configuration when they first launch, to configure
their access to the VPN.
2022-10-03 18:29:47 -05:00
Dustin 3f17373624 Change WireGuard keys -> configs
Setting up the WireGuard client requires several pieces of information,
beyond the node private key and peer's public key.  The peer endpoint
address/port, peer public key, and node IP address are also required.
As such, naming the resource a "key" is somewhat misleading.
2022-10-03 18:20:46 -05:00
Dustin 3916e0eac9 Assign WireGuard keys to EC2 instances
In order to join the on-premises Kubernetes cluster, EC2 instances will
need to first connect to the WireGuard VPN.  The *dynk8s* provisioner
will provide keys to instances to configure their WireGuard clients.

WireGuard keys must be pre-configured on the server and stored in
Kubernetes as *dynk8s.du5t1n.me/wireguard-key* Secret resources.  They
must also have a `dynk8s.du5t1n.me/ec2-instance-id` label.  If this
label is empty, the key is available to be assigned to an instance.

When an EventBridge event is received indicating an instance is now
running, a WireGuard key is assigned to that instance (by setting the
`dynk8s.du5t1n.me/ec2-instance-id` label).  Conversely, when an event is
received indicating that the instance is terminated, any WireGuard keys
assigned to that instance are freed.
2022-10-01 12:17:32 -05:00
Dustin 25d7be004c Begin EC2 instance state event handler
The lifecycle of ephemeral Kubernetes worker nodes is driven by events
emitted by Amazon EventBridge and delivered via Amazon Simple
Notification Service.  These events trigger the *dynk8s* provisioner to
take the appropriate action based on the state of an EC2 instance.

In order to add a node to the cluster using `kubeadm`, a "bootstrap
token" needs to be created.  When manually adding a node, this would be
done e.g. using `kubeadm token create`.  Since bootstrap tokens are just
a special type of Secret, they can be easily created programmatically as
well.  When a new EC2 instance enters the "running" state, the
provisioner creates a new bootstrap token and associates it with the
instance by storing the instance ID in a label in the Secret resource's
metadata.

The initial implementation of the event handler is rather naïve.  It
generates a token for every instance, though some instances may not be
intended to be used as Kubernetes workers.  Ideally, the provisioner
would only allocate tokens for instances matching some configurable
criteria, such as AWS tags.  Further, a token is allocated every time
the instance enters the running state, even if a token already exists or
is not needed.
2022-10-01 10:34:03 -05:00
Dustin 8e1165eb95 terraform: Begin AWS configuration
The `terraform` directory contains the resource descriptions for all AWS
services that need to be configured in order for the dynamic K8s
provisioner to work.  Specifically, it defines the EventBridge rule and
SNS topic/subscriptions that instruct AWS to send EC2 instance state
change notifications to the *dynk8s-provisioner*'s HTTP interface.
2022-09-27 12:58:51 -05:00
Dustin c721571043 container: Rebase on Fedora 35
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
Fedora 36 has OpenSSL 3, while the *rust* container image has OpenSSL
1.1.  Since Fedora 35 is still supported, and it includes OpenSSL 1.1,
we can use it as our base for the runtime image.
2022-09-11 13:17:54 -05:00
Dustin c8e0fe1256 ci: Begin Jenkins build pipeline
dustin/dynk8s-provisioner/pipeline/head This commit looks good Details
2022-09-10 10:30:54 -05:00
Dustin ac1b20d910 sns: Save messages to disk
Upon receipt of a notification or unsubscribe confirmation message from
SNS, after the message signature has been verified, the receiver will
now write the re-serialized contents of the message out to the
filesystem.  This will allow the messages to be inspected later in order
to develop additional functionality for this service.

The messages are saved in a `messages` director within the current
working directory.  This directory contains a subdirectory for each SNS
topic.  Within the topic subdirectories, the each message is saved in a
file named with the message timestamp and ID.
2022-09-05 09:45:44 -05:00
Dustin ab45823654 Begin HTTP server, SNS message receiver
This commit introduces the HTTP interface for the dynamic K8s node
provisioner.  It will serve as the main communication point between the
ephemeral nodes in the cloud, sharing the keys and tokens they require
in order to join the Kubernetes cluster.

The initial functionality is simply an Amazon SNS notification receiver.
SNS notifications will be used to manage the lifecycle of the dynamic
nodes.

For now, the notification receiver handles subscription confirmation
messages by following the link provided to confirm the subscription.
All other messages are simply written to the filesystem; these will be
used to implement and test future functionality.
2022-09-03 22:58:23 -05:00
Dustin 3ce72623e6 model: sns: Add union type
The `model::sns::Message` enumeration provides a mechanism for
deserializing a JSON document into the correct type.  It will be used by
the HTTP operation that receives messages from SNS in order to determine
the correct action to take in response to the message.
2022-09-03 22:57:07 -05:00
Dustin 196a43c49c sns: Begin work on Amazon SNS message handling
In order to prevent arbitrary clients from using the provisioner to
retrieve WireGuard keys and Kubernetes bootstrap tokens, access to those
resources *must* be restricted to the EC2 machines created by the
Kubernetes Cloud Autoscaler.  The key to the authentication process will
be SNS notifications from AWS to indicate when new EC2 instances are
created; everything that the provisioner does will be associated with an
instance it discovered through an SNS notification.

SNS messages are signed using PKCS#1 v1.5 RSA-SHA1, with a public key
distributed in an X.509 certificate.  To ensure that messages received
are indeed from AWS, the provisioner will need to verify those
signatures.  Messages with missing or invalid signatures will be
considered unsafe and ignored.

The `model::sns` module includes the data structures that represent SNS
messages.  The `sns::sig` module includes the primitive operations for
implementing signature verification.
2022-09-01 18:22:22 -05:00
Dustin 90e5bd65ca Initial commit 2022-08-31 21:02:17 -05:00