Nginx prematurely closing connections when reloaded
Maxim Dounin
mdounin at mdounin.ru
Mon Mar 25 15:20:09 UTC 2024
Hello!
On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:
> I have an issue with nginx closing prematurely connections when reload is
> performed.
>
> I have some nginx servers configured to proxy_pass requests to an upstream
> group. This group itself is composed of several servers which are nginx
> themselves, and is configured to use keepalive connections.
>
> When I trigger a reload (-s reload) on an nginx of one of the servers which
> is target of the upstream, I see in error logs of all servers in front that
> connection was reset by the nginx which was reloaded.
[...]
> And here the kind of error messages I get when I reload nginx of "IP_1":
>
> --- BEGIN ---
>
> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
> Connection reset by peer) while reading response header from upstream,
> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
> "REFERRER_HIDDEN"
>
> --- END ---
>
>
> I thought -s reload was doing graceful shutdown of connections. Is it due
> to the fact that nginx can not handle that when using keepalive
> connections? Is it a bug?
>
> I am using nginx 1.24.0 everywhere, no particular
This looks like a well known race condition when closing HTTP
connections. In RFC 2616, it is documented as follows
(https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4):
A client, server, or proxy MAY close the transport connection at any
time. For example, a client might have started to send a new request
at the same time that the server has decided to close the "idle"
connection. From the server's point of view, the connection is being
closed while it was idle, but from the client's point of view, a
request is in progress.
This means that clients, servers, and proxies MUST be able to recover
from asynchronous close events. Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2). Non-idempotent methods or sequences
MUST NOT be automatically retried, although user agents MAY offer a
human operator the choice of retrying the request(s). Confirmation by
user-agent software with semantic understanding of the application
MAY substitute for user confirmation. The automatic retry SHOULD NOT
be repeated if the second sequence of requests fails.
That is, when you shutdown your backend server, it closes the
keepalive connection - which is expected to be perfectly safe from
the server point of view. But if at the same time a request is
being sent to this connection by the client (frontend nginx server
in your case) - this might result in an error.
Note that the race is generally unavoidable and such errors can
happen at any time, during any connection close by the server.
Closing multiple keepalive connections during shutdown makes such
errors more likely though, since connections are closed right
away, and not after keepalive timeout expires. Further, since in
your case there are just a few loaded keepalive connections, this
also makes errors during shutdown more likely.
Typical solution is to retry such requests, as RFC 2616
recommends. In particular, nginx does so based on the
"proxy_next_upstream" setting. Note that to retry POST requests
you will need "proxy_next_upstream ... non_idempotent;" (which
implies that non-idempotent requests will be retried on errors,
and might not be the desired behaviour).
Another possible approach is to try to minimize the race window by
waiting some time after the shutdown before closing keepalive
connections. There were several attempts in the past to implement
this, the last one can be found here:
https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html
While there are some questions to the particular patch, something
like this should probably be implemented.
This is my TODO list, so a proper solution should be eventually
available out of the box in upcoming freenginx releases.
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list