Nginx prematurely closing connections when reloaded

Maxim Dounin mdounin at mdounin.ru
Mon Mar 25 15:20:09 UTC 2024


Hello!

On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:

> I have an issue with nginx closing prematurely connections when reload is
> performed.
> 
> I have some nginx servers configured to proxy_pass requests to an upstream
> group. This group itself is composed of several servers which are nginx
> themselves, and is configured to use keepalive connections.
> 
> When I trigger a reload (-s reload) on an nginx of one of the servers which
> is target of the upstream, I see in error logs of all servers in front that
> connection was reset by the nginx which was reloaded.

[...]

> And here the kind of error messages I get when I reload nginx of "IP_1":
> 
> --- BEGIN ---
> 
> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
> Connection reset by peer) while reading response header from upstream,
> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
> "REFERRER_HIDDEN"
> 
> --- END ---
> 
> 
> I thought -s reload was doing graceful shutdown of connections. Is it due
> to the fact that nginx can not handle that when using keepalive
> connections? Is it a bug?
> 
> I am using nginx 1.24.0 everywhere, no particular

This looks like a well known race condition when closing HTTP 
connections.  In RFC 2616, it is documented as follows
(https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4):

   A client, server, or proxy MAY close the transport connection at any
   time. For example, a client might have started to send a new request
   at the same time that the server has decided to close the "idle"
   connection. From the server's point of view, the connection is being
   closed while it was idle, but from the client's point of view, a
   request is in progress.

   This means that clients, servers, and proxies MUST be able to recover
   from asynchronous close events. Client software SHOULD reopen the
   transport connection and retransmit the aborted sequence of requests
   without user interaction so long as the request sequence is
   idempotent (see section 9.1.2). Non-idempotent methods or sequences
   MUST NOT be automatically retried, although user agents MAY offer a
   human operator the choice of retrying the request(s). Confirmation by
   user-agent software with semantic understanding of the application
   MAY substitute for user confirmation. The automatic retry SHOULD NOT
   be repeated if the second sequence of requests fails.

That is, when you shutdown your backend server, it closes the 
keepalive connection - which is expected to be perfectly safe from 
the server point of view.  But if at the same time a request is 
being sent to this connection by the client (frontend nginx server 
in your case) - this might result in an error.

Note that the race is generally unavoidable and such errors can 
happen at any time, during any connection close by the server.  
Closing multiple keepalive connections during shutdown makes such 
errors more likely though, since connections are closed right 
away, and not after keepalive timeout expires.  Further, since in 
your case there are just a few loaded keepalive connections, this 
also makes errors during shutdown more likely.

Typical solution is to retry such requests, as RFC 2616 
recommends.  In particular, nginx does so based on the 
"proxy_next_upstream" setting.  Note that to retry POST requests 
you will need "proxy_next_upstream ... non_idempotent;" (which 
implies that non-idempotent requests will be retried on errors, 
and might not be the desired behaviour).

Another possible approach is to try to minimize the race window by 
waiting some time after the shutdown before closing keepalive 
connections.  There were several attempts in the past to implement 
this, the last one can be found here:

https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html

While there are some questions to the particular patch, something 
like this should probably be implemented.

This is my TODO list, so a proper solution should be eventually 
available out of the box in upcoming freenginx releases.

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/



More information about the nginx mailing list