This article extends the setup explained in the previous article
Briefly, the setup consists of a load balancer, an HTTP server, and a PHP-fpm backend, all running in a Docker Swarm environment as explained here.
Previously the load balancer was bound to the manager node in the Docker swarm because it needed access to the Let’s Encrypt certificate files. To prepare for a fully replicated and fault tolerant design, this needs to be fixed so it can run from any node.
Because of the mesh network in Docker swarm, the load balancer does not need to run on the manager node where the external IP is bound. It can run on any host; the mesh network will route the request to the right container. But that requires us to replicate the Let’s Encrypt certificates and make sure they can be renewed and reloaded independently of which host the load balancer is running. This article explains how I changed that and moved renewal into Docker.
To recap the setup looks like this, explained in more detail here:
The two squares with the Docker icon is virtual machines running Docker. And the blue computer icons are Docker containers. The HTTP and PHP-FPM service are replicated across two nodes. But the load balancer is locked to one node because the Let’s Encrypt certificates are located in a local volume on the host. If the load balancer were to start on the other node the files would be missing cause errors.
Let’s Encrypt certificates expires after 90 days, so an automated process is needed to make the update happen automatically. It is handled by a cron job located on the Docker host; it looks like this:
docker run —rm —name letsencrypt -v “/etc/:/etc/letsencrypt” -v “/var/lib/letsencrypt/:/var/lib/letsencrypt” certbot/certbot:latest renew —quiet —no-self-upgrade
It starts a docker image that mounts the directory with the certificates and renews them.
Since this cron is running on the Docker host we need to remember it if the setup ever changes, it would be much better if it were changed to run inside Docker like the rest of the services.
All files used by the HTTP and PHP-FPM service are already replicated with GlusterFS, so they are accessible across the hosts.
To replicate the certificates we move the files inside the GlusterFS managed partition and the files will be accessible across the Docker Swarm. Very easy :-) The GlusterFS setup is explained in detail here.
From Dockers point of view, the replicated folder is like any other folder, so the only changed needed in the Docker Swarm setup is to change which folders are mounted.
As mentioned above the certbot renew script runs as a cron task on the Docker host.
To move the renewal into Docker I see two options
I went with option 2, to explore how to trigger Docker images to run on a schedule and to avoid any problems with certbot running for extended periods of time.
As you saw above the command to renew the certificates is a one-off Docker container, it boots, checks if the certificates need to be renewed, renews them, and quits.
Which means that “something” needs to drive the scheduling to make sure the command is run.
It is not possible, from within a Docker container to run docker commands, out-of-the-box. Both docker tooling and access to the socket are needed inside the container. It adds a small dependency to the setup. The docker version installed on the host and inside the container needs to be compatible. Docker seems to be very stable in their API, so I do not expect that to break.
After certbot has updated the certificates, GlusterFS will replicate the files to all nodes. After that, the load balancer service needs to be notified to reload the certificates. It can be handled by running the command:
docker kill -s HUP nginx
The problem is that it needs to be run on every node, and Docker does not provide a way to send arbitrary signals to processes across a swarm, a lengthy discussion about it is. If signals could be sent it would have had the benefit that nginx could continue serving requests and just reload the certificates, with no need to restart the containers.
Docker swarm does support restarting the services across a swarm. This can be done using the service update command like this
docker service update —force —update-parallelism 1 —update-delay 30s patch_loadbalancer
The command forces an update of the load balancer service, one container at a time, with a 30-second delay between each restart. 30 seconds between the restarts should allow serving requests with no downtime.
The image consists of 3 files, the Dockerfile, a cron scheduler file and the renew script.
You can find them here.
Certbot’s Docker image can be run as many times as you want, it only renews the certificates if there are less than 30 days until they expire.
With the improvements to the setup, it allows the setup to be more fault tolerant in the future.