<div dir="ltr">Thank you Maxim for that comprehensive explanation.<div>I will think about non_idempotent then, and wait for an eventual release of freenginx that natively solves that issue :)</div><div>Have a great day</div><div>Sébastien.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le lun. 25 mars 2024 à 16:20, Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello!<br>

<br>

On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:<br>

<br>

> I have an issue with nginx closing prematurely connections when reload is<br>

> performed.<br>

> <br>

> I have some nginx servers configured to proxy_pass requests to an upstream<br>

> group. This group itself is composed of several servers which are nginx<br>

> themselves, and is configured to use keepalive connections.<br>

> <br>

> When I trigger a reload (-s reload) on an nginx of one of the servers which<br>

> is target of the upstream, I see in error logs of all servers in front that<br>

> connection was reset by the nginx which was reloaded.<br>

<br>

[...]<br>

<br>

> And here the kind of error messages I get when I reload nginx of "IP_1":<br>

> <br>

> --- BEGIN ---<br>

> <br>

> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:<br>

> Connection reset by peer) while reading response header from upstream,<br>

> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST<br>

> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "<br>

> <a href="http://IP_1:80/REQUEST_LOCATION_HIDDEN" rel="noreferrer" target="_blank">http://IP_1:80/REQUEST_LOCATION_HIDDEN</a>", host: "HOST_HIDDEN", referrer:<br>

> "REFERRER_HIDDEN"<br>

> <br>

> --- END ---<br>

> <br>

> <br>

> I thought -s reload was doing graceful shutdown of connections. Is it due<br>

> to the fact that nginx can not handle that when using keepalive<br>

> connections? Is it a bug?<br>

> <br>

> I am using nginx 1.24.0 everywhere, no particular<br>

<br>

This looks like a well known race condition when closing HTTP <br>

connections.  In RFC 2616, it is documented as follows<br>

(<a href="https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4" rel="noreferrer" target="_blank">https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4</a>):<br>

<br>

   A client, server, or proxy MAY close the transport connection at any<br>

   time. For example, a client might have started to send a new request<br>

   at the same time that the server has decided to close the "idle"<br>

   connection. From the server's point of view, the connection is being<br>

   closed while it was idle, but from the client's point of view, a<br>

   request is in progress.<br>

<br>

   This means that clients, servers, and proxies MUST be able to recover<br>

   from asynchronous close events. Client software SHOULD reopen the<br>

   transport connection and retransmit the aborted sequence of requests<br>

   without user interaction so long as the request sequence is<br>

   idempotent (see section 9.1.2). Non-idempotent methods or sequences<br>

   MUST NOT be automatically retried, although user agents MAY offer a<br>

   human operator the choice of retrying the request(s). Confirmation by<br>

   user-agent software with semantic understanding of the application<br>

   MAY substitute for user confirmation. The automatic retry SHOULD NOT<br>

   be repeated if the second sequence of requests fails.<br>

<br>

That is, when you shutdown your backend server, it closes the <br>

keepalive connection - which is expected to be perfectly safe from <br>

the server point of view.  But if at the same time a request is <br>

being sent to this connection by the client (frontend nginx server <br>

in your case) - this might result in an error.<br>

<br>

Note that the race is generally unavoidable and such errors can <br>

happen at any time, during any connection close by the server.  <br>

Closing multiple keepalive connections during shutdown makes such <br>

errors more likely though, since connections are closed right <br>

away, and not after keepalive timeout expires.  Further, since in <br>

your case there are just a few loaded keepalive connections, this <br>

also makes errors during shutdown more likely.<br>

<br>

Typical solution is to retry such requests, as RFC 2616 <br>

recommends.  In particular, nginx does so based on the <br>

"proxy_next_upstream" setting.  Note that to retry POST requests <br>

you will need "proxy_next_upstream ... non_idempotent;" (which <br>

implies that non-idempotent requests will be retried on errors, <br>

and might not be the desired behaviour).<br>

<br>

Another possible approach is to try to minimize the race window by <br>

waiting some time after the shutdown before closing keepalive <br>

connections.  There were several attempts in the past to implement <br>

this, the last one can be found here:<br>

<br>

<a href="https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html" rel="noreferrer" target="_blank">https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html</a><br>

<br>

While there are some questions to the particular patch, something <br>

like this should probably be implemented.<br>

<br>

This is my TODO list, so a proper solution should be eventually <br>

available out of the box in upcoming freenginx releases.<br>

<br>

Hope this helps.<br>

<br>

-- <br>

Maxim Dounin<br>

<a href="http://mdounin.ru/" rel="noreferrer" target="_blank">http://mdounin.ru/</a><br>

-- <br>

nginx mailing list<br>

<a href="mailto:nginx@freenginx.org" target="_blank">nginx@freenginx.org</a><br>

<a href="https://freenginx.org/mailman/listinfo/nginx" rel="noreferrer" target="_blank">https://freenginx.org/mailman/listinfo/nginx</a><br>

</blockquote></div>