So, the root of my VAF woes stemmed from the fact that Let’s Encrypt, upon my request, had previously issued an untrusted cert for https://vaf.grinnell.edu (because I used the staging environment during development of this blog), and I was unable to find it or override it with a trusted cert. I was under the impression that in my workflow the cert was being stored inside one of my Docker containers… and it was. But I couldn’t fathom why the untrusted cert seemed to “persist”, even though I had deleted and regenerated those containers many times. Hmmm…
All of this was compounded by the fact that in my configuration the Traefik container is built from “scratch” so that it is very “lean”. That essentially means there’s no shell in the container, so I can’t just do docker exec -it traefik_proxy sh to open a terminal and look inside.
Finally, this morning, I took a hard look at the files/docker-compose.yml in my docker-bootstrap project and it dawned on me… that workflow explicitly saves certs into a protected /root/acme.json file as part of the traefik_proxy service configuration, like so:
That last line is the key! It’s telling the configuration that /root/acme.jsonshould persist on the SERVER and map to /root/acme.json inside the traefik_proxy container. Duh, that’s where the certs live…and persist.
The steps I took to fix this, from a terminal on static.grinnell.edu, were:
The first four lines above made a backup of acme.json, just in case, then removed and replaced it with a pristine, empty file with proper permissions. The last line stopped all of the Docker containers running on the server, removed them and their associated volumes, then pruned away all remnants (images, networks, etc.) of those containers.
After this I started rebuilding the server and services using the process documented in docker-bootstrap Workflow. Problem solved! Woot! Along the way I watched the process create fresh, new, trusted certs in the server’s new /root/acme.json file.