Nginx prematurely closing connections when reloaded

Mon Mar 25 15:40:29 UTC 2024

Thank you Maxim for that comprehensive explanation.
I will think about non_idempotent then, and wait for an eventual release of
freenginx that natively solves that issue :)
Have a great day
Sébastien.

Le lun. 25 mars 2024 à 16:20, Maxim Dounin <mdounin at mdounin.ru> a écrit :

> Hello!
>
> On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:
>
> > I have an issue with nginx closing prematurely connections when reload is
> > performed.
> >
> > I have some nginx servers configured to proxy_pass requests to an
> upstream
> > group. This group itself is composed of several servers which are nginx
> > themselves, and is configured to use keepalive connections.
> >
> > When I trigger a reload (-s reload) on an nginx of one of the servers
> which
> > is target of the upstream, I see in error logs of all servers in front
> that
> > connection was reset by the nginx which was reloaded.
>
> [...]
>
> > And here the kind of error messages I get when I reload nginx of "IP_1":
> >
> > --- BEGIN ---
> >
> > 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
> > Connection reset by peer) while reading response header from upstream,
> > client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
> > /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
> > http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
> > "REFERRER_HIDDEN"
> >
> > --- END ---
> >
> >
> > I thought -s reload was doing graceful shutdown of connections. Is it due
> > to the fact that nginx can not handle that when using keepalive
> > connections? Is it a bug?
> >
> > I am using nginx 1.24.0 everywhere, no particular
>
> This looks like a well known race condition when closing HTTP
> connections.  In RFC 2616, it is documented as follows
> (https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4):
>
>    A client, server, or proxy MAY close the transport connection at any
>    time. For example, a client might have started to send a new request
>    at the same time that the server has decided to close the "idle"
>    connection. From the server's point of view, the connection is being
>    closed while it was idle, but from the client's point of view, a
>    request is in progress.
>
>    This means that clients, servers, and proxies MUST be able to recover
>    from asynchronous close events. Client software SHOULD reopen the
>    transport connection and retransmit the aborted sequence of requests
>    without user interaction so long as the request sequence is
>    idempotent (see section 9.1.2). Non-idempotent methods or sequences
>    MUST NOT be automatically retried, although user agents MAY offer a
>    human operator the choice of retrying the request(s). Confirmation by
>    user-agent software with semantic understanding of the application
>    MAY substitute for user confirmation. The automatic retry SHOULD NOT
>    be repeated if the second sequence of requests fails.
>
> That is, when you shutdown your backend server, it closes the
> keepalive connection - which is expected to be perfectly safe from
> the server point of view.  But if at the same time a request is
> being sent to this connection by the client (frontend nginx server
> in your case) - this might result in an error.
>
> Note that the race is generally unavoidable and such errors can
> happen at any time, during any connection close by the server.
> Closing multiple keepalive connections during shutdown makes such
> errors more likely though, since connections are closed right
> away, and not after keepalive timeout expires.  Further, since in
> your case there are just a few loaded keepalive connections, this
> also makes errors during shutdown more likely.
>
> Typical solution is to retry such requests, as RFC 2616
> recommends.  In particular, nginx does so based on the
> "proxy_next_upstream" setting.  Note that to retry POST requests
> you will need "proxy_next_upstream ... non_idempotent;" (which
> implies that non-idempotent requests will be retried on errors,
> and might not be the desired behaviour).
>
> Another possible approach is to try to minimize the race window by
> waiting some time after the shutdown before closing keepalive
> connections.  There were several attempts in the past to implement
> this, the last one can be found here:
>
>
> https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html
>
> While there are some questions to the particular patch, something
> like this should probably be implemented.
>
> This is my TODO list, so a proper solution should be eventually
> available out of the box in upcoming freenginx releases.
>
> Hope this helps.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> --
> nginx mailing list
> nginx at freenginx.org
> https://freenginx.org/mailman/listinfo/nginx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://freenginx.org/pipermail/nginx/attachments/20240325/8d564b44/attachment.htm>