My experience implementing graceful shutdown for celery workers spawned by supervisord inside a docker container.
Supervisord part
supervisord.conf
...
[supervisord]
...
nodaemon=true # run supervisord in the foreground
[include]
files=celery.conf # path to the celery config file
Set nodaemon=true so that we can start it as a background process from the entrypoint script later.
celery.conf
[group:celery_workers]
programs=one, two
[program:one]
...
command=celery -A backend --config=celery.py worker -n worker_one --pidfile=/var/log/celery/worker_one.pid --pool=gevent --concurrency=10 --loglevel=INFO
killasgroup=true
stopasgroup=true
stopsignal=TERM
stopwaitsecs=600
[program:two]
...
# similar to the previous one
The configuration file above is responsible for starting a group of workers each running in a separate process within a group. I'd like to stop on a stopwaitsecs section value. Let's see what the documentation tells us about it:
This parameter sets the number of seconds to wait for the OS to return
a SIGCHLD to supervisord after the program has been sent a
stopsignal. If this number of seconds elapses before supervisord
receives a SIGCHLD from the process, supervisord will attempt to kill
it with a final SIGKILL.
If stopwaitsecs>stop_grace_period specified for your service in a docker-compose file then you'll be getting SIGKILL from your docker. Make sure
stopwaitsecs<stop_grace_period, otherwise all running tasks get interrupted by docker.
Entrypoint script part
entrypoint.sh
#!/bin/bash
# safety switch, exit script if there's error.
set -e
on_close(){
echo "Signal caught..."
echo "Supervisor is stopping processes gracefully..."
# cleanup all pid files
rm worker_one.pid
rm worker_two.pid
supervisorctl stop celery_workers:
echo "All processes have been stopped. Exiting..."
exit 1
}
start_supervisord(){
supervisord -c /etc/supervisor/supervisord.conf
}
# start trapping signals (docker sends `SIGTERM` for shutdown)
trap on_close SIGINT SIGTERM SIGKILL
start_supervisord & # start supervisord in a background
SUPERVISORD_PID=$! # PID of the last background process started
wait $SUPERVISORD_PID
EXIT_STATUS=$? # the exit status of the last command executed
The script above consists of:
- registering a cleanup function
on_close
- starting
supervisord's process group in a background
- registering the last background process's
PID and waiting for it to finish
Docker part
docker-compose.yml
...
services:
celery:
...
stop_grace_period: 15m30s
entrypoint: [/entrypoints/entrypoint.sh]
The only setting worth mentioning here is entrypoint form declaration. In our case better to use exec form. It starts an executable script in a process with PID 1 and doesn't create any subprocesses as shell form does. SIGTERM from docker stop <container> gets propagated to an executable which traps it and performs all cleaning and closing logic.