[PATCH]HTTP/2 connection not properly closing during graceful shutdown
Kasei Wang
kasei at kasei.im
Fri May 17 08:35:57 UTC 2024
Hello!
On Fri, May 17, 2024 at 2:06 AM Maxim Dounin <mdounin at mdounin.ru> wrote:
>
> Hello!
>
> On Wed, May 15, 2024 at 11:46:29AM +0800, Kasei Wang wrote:
>
> > Hello,
> >
> > I found that there is a slight probability of HTTP/2 connections not
> > properly closing during graceful shutdown, leading to worker processes
> > in shutting down state remaining stuck for an extended period. After
> > investigation, the issue appears to stem from the following:
> >
> > 1. worker processes in shutting down state use
> > ngx_close_idle_connections to close all idle connections, including
> > HTTP/2 connections.
> > 2. For HTTP/2 connections, c->idle is set to true in ngx_http_v2_init.
> > According to the explanation in
> > <https://hg.nginx.org/nginx/rev/5e95b9fb33b7>, GOAWAY should be sent to
> > all HTTP/2 connections.
> > 3. There might be a time gap between ngx_event_accept and
> > ngx_http_v2_init. For TLS connections, ngx_http_v2_init will be
> > executed after ALPN received, and for plaintext http2 connections,
> > ngx_http_v2_init will be executed after parsing the http2 preface. If
> > ngx_close_idle_connections is executed between ngx_event_accept and
> > ngx_http_v2_init, there's a possibility that c->idle of some
> > connections is set to true AFTER ngx_close_idle_connections, causing
> > those connections to not enter the GOAWAY process and leading to the
> > aforementioned problem.
> >
> > To verify this, I've written a simple HTTP/2 client. This program will
> > wait 15 seconds after TCP connection establishment before starting to
> > send data. The purpose of sleep is to to raise the probability of
> > encountering the issue. You can reproduce the problem by executing
> > "nginx -s reload" during this 15-second wait. If you're interested, you
> > can try my test program
> > (<https://github.com/kaseiwang/http2-grace-shutdown-test>) to reproduce
> > the issue.
> >
> > The following patch would call ngx_http_v2_finalize_connection to close
> > http2 connections which is initialized after
> > ngx_close_idle_connections.
> >
> > And here are some previous discussion on the another maillist:
> > <https://mailman.nginx.org/pipermail/nginx-devel/2024-April/TSBIREWXEN7B4U7SRSR4LRJDPOR3XH4Q.html
> > >
> >
> > Please confirm if this issue exists, review my analysis and the patch
> > if possible. Thank you very much.
>
> I think your analysis is correct. Thanks for catching this.
>
> > # HG changeset patch
> > # User Kasei Wang <kasei at kasei.im>
> > # Date 1715744317 -28800
> > # Wed May 15 11:38:37 2024 +0800
> > # Node ID 7f18358cc012039f6bb4e502a6f39a7bd50d669b
> > # Parent 4cfd64a8330dd5bacd839796732fd8c49d87e86e
> > HTTP/2: close http2 connections initialized during graceful shutdown.
> >
> > In some rare cases, a HTTP/2 connections can be initialized during a
> > graceful shutdown. Now close such an connection to avoid unexcepted
> > delays in the graceful shutdown.
> >
> > diff -r 4cfd64a8330d -r 7f18358cc012 src/http/v2/ngx_http_v2.c
> > --- a/src/http/v2/ngx_http_v2.c Tue May 14 17:47:44 2024 +0300
> > +++ b/src/http/v2/ngx_http_v2.c Wed May 15 11:38:37 2024 +0800
> > @@ -304,6 +304,11 @@
> > c->idle = 1;
> > ngx_reusable_connection(c, 0);
> >
> > + if (ngx_exiting) {
> > + ngx_http_v2_finalize_connection(h2c, NGX_HTTP_V2_NO_ERROR);
> > + return;
> > + }
> > +
> > if (c->buffer) {
> > p = c->buffer->pos;
> > end = c->buffer->last;
>
> The patch looks working, though in the case in question the code
> will do various otherwise unneeded things, notably will send
> SETTINGS and WINDOW_UPDATE frames to the client. OTOH, avoiding
> these might be tricky and will require more complex changes, and
> probably don't worth the effort.
>
> Another approach might be to allow just one request if a
> connection is established just before the shutdown, much like
> HTTP/1.x code does. Something as simple as the following patch
> should do the trick:
>
> diff --git a/src/http/v2/ngx_http_v2.c b/src/http/v2/ngx_http_v2.c
> --- a/src/http/v2/ngx_http_v2.c
> +++ b/src/http/v2/ngx_http_v2.c
> @@ -1336,7 +1336,8 @@ ngx_http_v2_state_headers(ngx_http_v2_co
> clcf = ngx_http_get_module_loc_conf(h2c->http_connection->conf_ctx,
> ngx_http_core_module);
>
> - if (clcf->keepalive_timeout == 0
> + if (ngx_exiting
> + || clcf->keepalive_timeout == 0
> || h2c->connection->requests >= clcf->keepalive_requests
> || ngx_current_msec - h2c->connection->start_time
> > clcf->keepalive_time)
>
>
> What do you think?
>
> --
> Maxim Dounin
> http://mdounin.ru/
> --
> nginx-devel mailing list
> nginx-devel at freenginx.org
> https://freenginx.org/mailman/listinfo/nginx-devel
Thanks for your response.
Actually, I was worried that sending GOAWAY on stream 0 might cause
compatibility issues for some clients. There are so many client
implementations, and I can't be sure that all clients can correctly
retry the request in this situation.
The disadvantage of your approach is that the worker process may still
need to wait for the first request to arrive. This is not a very long
time, and it should be limited by the client_header_timeout, which is
60s by default. So I think the waiting is acceptable. The advantage of
this approach is that the process to send GOAWAY frame would be more
similar to the existing reload process, reducing the possibility of
affecting client compatibility.
Overall, I prefer your approach because it minimizes the change impact
for the client.
More information about the nginx-devel
mailing list