Nginx prematurely closing connections when reloaded

Tue Mar 26 14:03:19 UTC 2024

Hi Maxim,

I finally decided to activate retries of non idempotent requests, cause I
already manage data deduplication.

Now I have: proxy_next_upstream error timeout invalid_header http_502
http_504 non_idempotent;

Nevertheless I still see the same error messages when I trigger a reload of
an nginx in upstream.
Does it mean the problem is that same, or just that nginx (the ones in
front) still displays such error messages for information, but has
effectively retried the request on another server in upstream?

Kind regards,

Sébastien.

Le lun. 25 mars 2024 à 16:40, Sébastien Rebecchi <srebecchi at kameleoon.com>
a écrit :

> Thank you Maxim for that comprehensive explanation.
> I will think about non_idempotent then, and wait for an eventual release
> of freenginx that natively solves that issue :)
> Have a great day
> Sébastien.
>
> Le lun. 25 mars 2024 à 16:20, Maxim Dounin <mdounin at mdounin.ru> a écrit :
>
>> Hello!
>>
>> On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:
>>
>> > I have an issue with nginx closing prematurely connections when reload
>> is
>> > performed.
>> >
>> > I have some nginx servers configured to proxy_pass requests to an
>> upstream
>> > group. This group itself is composed of several servers which are nginx
>> > themselves, and is configured to use keepalive connections.
>> >
>> > When I trigger a reload (-s reload) on an nginx of one of the servers
>> which
>> > is target of the upstream, I see in error logs of all servers in front
>> that
>> > connection was reset by the nginx which was reloaded.
>>
>> [...]
>>
>> > And here the kind of error messages I get when I reload nginx of "IP_1":
>> >
>> > --- BEGIN ---
>> >
>> > 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
>> > Connection reset by peer) while reading response header from upstream,
>> > client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
>> > /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
>> > http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
>> > "REFERRER_HIDDEN"
>> >
>> > --- END ---
>> >
>> >
>> > I thought -s reload was doing graceful shutdown of connections. Is it
>> due
>> > to the fact that nginx can not handle that when using keepalive
>> > connections? Is it a bug?
>> >
>> > I am using nginx 1.24.0 everywhere, no particular
>>
>> This looks like a well known race condition when closing HTTP
>> connections.  In RFC 2616, it is documented as follows
>> (https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4):
>>
>>    A client, server, or proxy MAY close the transport connection at any
>>    time. For example, a client might have started to send a new request
>>    at the same time that the server has decided to close the "idle"
>>    connection. From the server's point of view, the connection is being
>>    closed while it was idle, but from the client's point of view, a
>>    request is in progress.
>>
>>    This means that clients, servers, and proxies MUST be able to recover
>>    from asynchronous close events. Client software SHOULD reopen the
>>    transport connection and retransmit the aborted sequence of requests
>>    without user interaction so long as the request sequence is
>>    idempotent (see section 9.1.2). Non-idempotent methods or sequences
>>    MUST NOT be automatically retried, although user agents MAY offer a
>>    human operator the choice of retrying the request(s). Confirmation by
>>    user-agent software with semantic understanding of the application
>>    MAY substitute for user confirmation. The automatic retry SHOULD NOT
>>    be repeated if the second sequence of requests fails.
>>
>> That is, when you shutdown your backend server, it closes the
>> keepalive connection - which is expected to be perfectly safe from
>> the server point of view.  But if at the same time a request is
>> being sent to this connection by the client (frontend nginx server
>> in your case) - this might result in an error.
>>
>> Note that the race is generally unavoidable and such errors can
>> happen at any time, during any connection close by the server.
>> Closing multiple keepalive connections during shutdown makes such
>> errors more likely though, since connections are closed right
>> away, and not after keepalive timeout expires.  Further, since in
>> your case there are just a few loaded keepalive connections, this
>> also makes errors during shutdown more likely.
>>
>> Typical solution is to retry such requests, as RFC 2616
>> recommends.  In particular, nginx does so based on the
>> "proxy_next_upstream" setting.  Note that to retry POST requests
>> you will need "proxy_next_upstream ... non_idempotent;" (which
>> implies that non-idempotent requests will be retried on errors,
>> and might not be the desired behaviour).
>>
>> Another possible approach is to try to minimize the race window by
>> waiting some time after the shutdown before closing keepalive
>> connections.  There were several attempts in the past to implement
>> this, the last one can be found here:
>>
>>
>> https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html
>>
>> While there are some questions to the particular patch, something
>> like this should probably be implemented.
>>
>> This is my TODO list, so a proper solution should be eventually
>> available out of the box in upcoming freenginx releases.
>>
>> Hope this helps.
>>
>> --
>> Maxim Dounin
>> http://mdounin.ru/
>> --
>> nginx mailing list
>> nginx at freenginx.org
>> https://freenginx.org/mailman/listinfo/nginx
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://freenginx.org/pipermail/nginx/attachments/20240326/7376beb1/attachment.htm>