The `cert-exporter` is no longer needed. All websites manage their own
certificates with _mod_md_ now, and all internal applications that use
the wildcard certificate fetch it directly from the Kubernetes Secret.
Ansible playbook running as Jenkins jobs need to be able to access the
Secret resources containing certificates issued by _cert-manager_ in
order to install them on managed nodes. Although not all jobs do this
yet, eventually, the _cert-exporter_ will no longer be necessary, as the
_certs.git_ repository will not be used anymore.
I've completely blocked all outgoing unencrypted DNS traffic at the
firewall now, which prevents _cert-manager_ from using its default
behavior of using the authoritative name servers for its managed domains
to check poll for ACME challenge DNS TXT record availability.
Fortunately, it has an option to use a recursive resolver (i.e. the
network-provided DNS server) instead.
Since transitioning to externalIPs for TCP services, it is no longer
possible to use the HTTP.01 ACME challenge to issue certificates for
services hosted in the cluster, because the ingress controller does not
listen on those addresses. Thus, we have to switch to using the DNS.01
challenge. I had avoided using it before because of the complexity of
managing dynamic DNS records with the Samba AD server, but this was
actually pretty to work around. I created a new DNS zone on the
firewall specifically for ACME challenges. Names in the AD-managed zone
have CNAME records for their corresponding *_acme-challenge* labels
pointing to this new zone. The new zone has dynamic updates enabled,
which _cert-manager_ supports using the RFC2136 plugin.
For now, this is only enabled for _rabbitmq.pyrocufflink.blue_. I will
transition the other names soon.
Now that the reverse proxy for Internet-facing sites uses TLS
passthrough, the certificate for the _darkchestofwonders.us_ Ingress
needs to be correct. Since Ingress resources can only use either the
default certificate (_*.pyrocufflink.blue_) or a certificate from their
same namespace, we have to move the Certificate and its corresponding
Secret into the _websites_ namespace. Fortunately, this is easy enoug
to do, by setting the appropriate annotations on the Ingress.
To keep the existing certificate (until it expires), I moved the Secret
manually:
```sh
kubectl get secret dcow-cert -o yaml | grep -v namespace | kubectl create -n websites -f -
```
Having name overrides for in-cluster services breaks ACME challenges,
because the server tries to connect to the Service instead of the
Ingress. To fix this, we need to configure both _cert-manager_ and
_step-ca_ to *only* resolve names using the network-wide DNS server.
In-cluster services can now get certificates signed by the DCH CA via
`step-ca`. This issuer uses ACME with the HTTP-01 challenge, so it
can only issue certificates for names in the _pyrocufflink.blue_ zone
that point to the ingress controllers.
The *cert-exporter* script really only needs the SSH host key for Gitea,
so the dynamic host key fetch is overkill. Since it frequently breaks
for various reasons, it's probably better to just have a static list of
trusted keys.
Hatch Learning Center has several domains; Tabitha couldn't decide which
she liked best :) At present, they all resolve to the same website, with
_hatchlearningcenter.org_ as the canonical name.
If I remember to add the _acme-challenge CNAME record *before* applying
the Certificate resource, it takes a little under 5 minutes to issue a
new certificate.
Using *acme-dns.io* is incredibly cumbersome. Since each unique
subdomain requires its own set of credentials, the `acme-dns.json` file
has to be updated every time a new certificate is added. This
effectively precludes creating certificates via Ingress annotations.
As Cloudflare's DNS service is free and anonymous as well, I thought I
would try it out as an alternative to *acme-dns.io*. It seems to work
well so far. One potential issue, though, is Cloudflare seems to have
several nameservers, with multiple IP addresses each. This may require
adding quite a few exceptions to the no-outbound-DNS rule on the
firewall. I tried using the "recursive servers only" mode of
*cert-manager*, however, as expected, the recursive servers all cache
too aggressively. Since the negative cache TTL value in the SOA record
for Cloudflare DNS zones is set to 1 hour and cannot be configured, ACME
challenges can take at least that long in this mode. Thus, querying the
authoritative servers directly is indeed the best option, even though it
violates the no-outbound-DNS rule.
Using the local name server as the authoritative server for ACME
challenge records turned out to be quite problematic. For some reason,
both Google and Cloudflare kept returning SERVFAIL responses for the
*_acme-challenge* TXT queries. I suspect this may have had something to
do with how BIND was configured to be the authoritative server for the
*o-ak4p9kqlmt5uuc.com* while also being a recusive resolver for clients
on the local network.
Using *acme-dns.io* resolves these issues, but it does bring a few of
its own. Notably, each unique domain and subdomain must have its own
set of credentials (specified in the `acme-dns.json`) file. This makes
adding new certificates rather cumbersome.
The `cert-exporter` tool fetches certificates from Kubernetes Secret
resources and commits them to a Git repository. This allows
certificates managed by *cert-manager* to be used outside the Kubernetes
cluster, e.g. for services running on other virtual machines.
The wildcard certificate for the *pyrocufflink.net* and
*pyrocufflink.blue* domains is now handled by *cert-manager* and saved
to *certs.git* by `cert-exporter.
*cert-manager* manages certificates. More specifically, it is an ACME
client, which generates certificate-signing requests, submits them to a
certificate authority, and stores the signed certificate in Kubernetes
secrets. The certificates it manages are defined by Kubernetes
Custom Resources, either defined manually or automatically for Ingress
resources with particular annotations.
The *cert-manager* deployment consists primarily of two services:
*cert-manager* itself, which monitors Kubernetes resources and manages
certificate requests, and the *cert-manager-webhook*, which validates
Kubernetes resources for *cert-manager*. There is also a third
component, *cainjector*, we do not need it.
The primary configuration for *cert-manager* is done through Issuer and
ClusterIssuer resources. These define how certificates are issued: the
certificate authority to use and how to handle ACME challenges. For our
purposes, we will be using ZeroSSL to issue certificates, verified via
the DNS.01 challenge through BIND running on the gateway firewall.