Klassmann | Blog

Handling SIGTERM in Gunicorn

Creating a custom worker to cleanup application before termination

Updated in 2024-03-05 · First published in 2024-03-05 · By Lucas Klassmann

The problem

I was working on an implementation, a code that would run in Google Cloud Run, it is written in Python and exposes a basic API in Flask. It uses Gunicorn to start the application.

Simple application but it would require a kind of control when memory is exceeded and sends a notification to another application.

The Cloud Run, in its container runtime contract, defines that the application may handle gracefully shutdowns due to normal or forced terminations (Occasions like the container exceeds the memory limits).

Before Cloud Run terminates the instance, it sends signals to all containers. The first signal is SIGTERM, which establishes the first stage of the termination, which has a timeout of 10 seconds. When the time expires, signal SIGKILL is sent, then the container is completely terminated.

You can find a basic implementation of how to handle signals in Google Cloud Run documentation.

The strategy

For the application I was working with, the following strategy was required:

A job request was sent by an application Service A(third-party) to Service B (My application).

In case of termination of incomplete jobs, I needed to finish the API request with success and tell another service, Service C(Monitoring agent), that the jobs were incomplete.

So, I had to:

  • Handle signals and notify Service C of pending/incomplete jobs.
  • Close current HTTP requests from Service A to the API(Service B) with a success code(HTTP 200).

The implementation

All this was accomplished by creating a custom Gunicorn worker class for the Service B application. With the custom class, I was able to handle the SIGTERM and also have access to underlying network sockets.

Just make sure that the routine finishes before the SIGTERM timeout, for Cloud Run, is just 10 seconds.

The basic idea is to inherit from a worker, in this case, SyncWorker. Here is the implementation:

import sys
import secrets
from types import FrameType
from gunicorn.workers.sync import SyncWorker
import gunicorn.util as util

class CustomWorker(SyncWorker):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.instance_id = secrets.token_hex(6)
        self.clients = []

    def _log(self, msg):
        print(f'[CUSTOM WORKER][{self.instance_id}]: {msg}')

    def accept(self, listener):
        client, addr = listener.accept()
        self.handle(listener, client, addr)

    def close_clients_gracefully(self, status=200, msg='OK FROM WORKER'):
        self._log(f'closing {len(self.clients)} clients gracefully')
        size = len(msg)
        response_data = f"HTTP/1.1 {status} {msg}\r\nContent-Length: {size}\r\n\r\n{msg}"
        response_data = response_data.encode('utf-8')
            for _socket in self.clients:
        except Exception as e:

    def notify_error(self):
        self._log(f'notifying errors')
        if hasattr(self, 'wsgi') and self.wsgi:
            if (hasattr(self.wsgi, "incomplete_jobs") and
                    isinstance(self.wsgi.incomplete_jobs, dict)):
                jobs = self.wsgi.incomplete_jobs

                for key, payload in jobs.items():
                    self._log(f'sending notification for job {key}')

    def handle_exit(self, sig: int, frame: FrameType):
        self._log(f'handling signal {sig}')

After implementing the custom worker, you need to start Gunicorn with -k parameter to the path of the new worker:

gunicorn --bind= -k app.worker.CustomWorker -w 2


  • The implementation is not failure-proof but can be a starting point for distributed systems with a fallback system.
  • We talk here about Cloud Run, but for sure, it can be implemented everywhere that would run a Gunicorn application, here was only the case where I needed such a solution.


When the CustomWorker, instantiated by Gunicorn, accepts a new client connection, it saves the client in a list. This client instance will be used later on to send HTTP 200.

On every request to Service B, an id to identify the job is stored inside the WSGI application. It is cleaned when the job normally finishes. When the worker receives the signal SIGTERM we do two things:

  • Check for incomplete jobs, stored inside the WSGI, and notify Service C.
  • Try to close any opened client sockets and send them a successful response.

For more details and a complete example check the repository:

Thank you.