[PATCH]HTTP/2 connection not properly closing during graceful shutdown

Kasei Wang kasei at kasei.im
Tue May 28 02:16:15 UTC 2024


Hello.

I noticed that this patch has not been merged yet. Are there any other
issues? Can I help with anything?

On Tue, May 21, 2024 at 5:26 PM Maxim Dounin <mdounin at mdounin.ru> wrote:
>
> Hello!
>
> On Fri, May 17, 2024 at 04:35:57PM +0800, Kasei Wang wrote:
>
> > Hello!
> >
> > On Fri, May 17, 2024 at 2:06 AM Maxim Dounin <mdounin at mdounin.ru> wrote:
> > >
> > > Hello!
> > >
> > > On Wed, May 15, 2024 at 11:46:29AM +0800, Kasei Wang wrote:
> > >
> > > > Hello,
> > > >
> > > > I found that there is a slight probability of HTTP/2 connections not
> > > > properly closing during graceful shutdown, leading to worker processes
> > > > in shutting down state remaining stuck for an extended period. After
> > > > investigation, the issue appears to stem from the following:
> > > >
> > > > 1. worker processes in shutting down state use
> > > > ngx_close_idle_connections to close all idle connections, including
> > > > HTTP/2 connections.
> > > > 2. For HTTP/2 connections, c->idle is set to true in ngx_http_v2_init.
> > > > According to the explanation in
> > > > <https://hg.nginx.org/nginx/rev/5e95b9fb33b7>, GOAWAY should be sent to
> > > > all HTTP/2 connections.
> > > > 3. There might be a time gap between ngx_event_accept and
> > > > ngx_http_v2_init. For TLS connections, ngx_http_v2_init will be
> > > > executed after ALPN received, and for plaintext http2 connections,
> > > > ngx_http_v2_init will be executed after parsing the http2 preface. If
> > > > ngx_close_idle_connections is executed between ngx_event_accept and
> > > > ngx_http_v2_init, there's a possibility that c->idle of some
> > > > connections is set to true AFTER ngx_close_idle_connections, causing
> > > > those connections to not enter the GOAWAY process and leading to the
> > > > aforementioned problem.
> > > >
> > > > To verify this, I've written a simple HTTP/2 client. This program will
> > > > wait 15 seconds after TCP connection establishment before starting to
> > > > send data. The purpose of sleep is to to raise the probability of
> > > > encountering the issue. You can reproduce the problem by executing
> > > > "nginx -s reload" during this 15-second wait. If you're interested, you
> > > > can try my test program
> > > > (<https://github.com/kaseiwang/http2-grace-shutdown-test>) to reproduce
> > > > the issue.
> > > >
> > > > The following patch would call ngx_http_v2_finalize_connection to close
> > > > http2 connections which is initialized after
> > > > ngx_close_idle_connections.
> > > >
> > > > And here are some previous discussion on the another maillist:
> > > > <https://mailman.nginx.org/pipermail/nginx-devel/2024-April/TSBIREWXEN7B4U7SRSR4LRJDPOR3XH4Q.html
> > > > >
> > > >
> > > > Please confirm if this issue exists, review my analysis and the patch
> > > > if possible. Thank you very much.
> > >
> > > I think your analysis is correct.  Thanks for catching this.
> > >
> > > > # HG changeset patch
> > > > # User Kasei Wang <kasei at kasei.im>
> > > > # Date 1715744317 -28800
> > > > #      Wed May 15 11:38:37 2024 +0800
> > > > # Node ID 7f18358cc012039f6bb4e502a6f39a7bd50d669b
> > > > # Parent  4cfd64a8330dd5bacd839796732fd8c49d87e86e
> > > > HTTP/2: close http2 connections initialized during graceful shutdown.
> > > >
> > > > In some rare cases, a HTTP/2 connections can be initialized during a
> > > > graceful shutdown. Now close such an connection to avoid unexcepted
> > > > delays in the graceful shutdown.
> > > >
> > > > diff -r 4cfd64a8330d -r 7f18358cc012 src/http/v2/ngx_http_v2.c
> > > > --- a/src/http/v2/ngx_http_v2.c Tue May 14 17:47:44 2024 +0300
> > > > +++ b/src/http/v2/ngx_http_v2.c Wed May 15 11:38:37 2024 +0800
> > > > @@ -304,6 +304,11 @@
> > > >      c->idle = 1;
> > > >      ngx_reusable_connection(c, 0);
> > > >
> > > > +    if (ngx_exiting) {
> > > > +        ngx_http_v2_finalize_connection(h2c, NGX_HTTP_V2_NO_ERROR);
> > > > +        return;
> > > > +    }
> > > > +
> > > >      if (c->buffer) {
> > > >          p = c->buffer->pos;
> > > >          end = c->buffer->last;
> > >
> > > The patch looks working, though in the case in question the code
> > > will do various otherwise unneeded things, notably will send
> > > SETTINGS and WINDOW_UPDATE frames to the client.  OTOH, avoiding
> > > these might be tricky and will require more complex changes, and
> > > probably don't worth the effort.
> > >
> > > Another approach might be to allow just one request if a
> > > connection is established just before the shutdown, much like
> > > HTTP/1.x code does.  Something as simple as the following patch
> > > should do the trick:
> > >
> > > diff --git a/src/http/v2/ngx_http_v2.c b/src/http/v2/ngx_http_v2.c
> > > --- a/src/http/v2/ngx_http_v2.c
> > > +++ b/src/http/v2/ngx_http_v2.c
> > > @@ -1336,7 +1336,8 @@ ngx_http_v2_state_headers(ngx_http_v2_co
> > >      clcf = ngx_http_get_module_loc_conf(h2c->http_connection->conf_ctx,
> > >                                          ngx_http_core_module);
> > >
> > > -    if (clcf->keepalive_timeout == 0
> > > +    if (ngx_exiting
> > > +        || clcf->keepalive_timeout == 0
> > >          || h2c->connection->requests >= clcf->keepalive_requests
> > >          || ngx_current_msec - h2c->connection->start_time
> > >             > clcf->keepalive_time)
> > >
> > >
> > > What do you think?
>
> [...]
>
> > Thanks for your response.
> >
> > Actually, I was worried that sending GOAWAY on stream 0 might cause
> > compatibility issues for some clients. There are so many client
> > implementations, and I can't be sure that all clients can correctly
> > retry the request in this situation.
> >
> > The disadvantage of your approach is that the worker process may still
> > need to wait for the first request to arrive. This is not a very long
> > time, and it should be limited by the client_header_timeout, which is
> > 60s by default. So I think the waiting is acceptable. The advantage of
> > this approach is that the process to send GOAWAY frame would be more
> > similar to the existing reload process,  reducing the possibility of
> > affecting client compatibility.
> >
> > Overall, I prefer your approach because it minimizes the change impact
> > for the client.
>
> Thanks for sharing.
>
> For completeness, full patch with commit log below.
>
> # HG changeset patch
> # User Maxim Dounin <mdounin at mdounin.ru>
> # Date 1716283099 -10800
> #      Tue May 21 12:18:19 2024 +0300
> # Node ID 429d40e8275d2606da7cb710de5bc40a905fe52f
> # Parent  46ecad404a296042c0088e699f275a92758e5ab9
> HTTP/2: handling of connections initialized during shutdown.
>
> If an HTTP/2 connection opened before a graceful shutdown, but
> ngx_http_v2_init() is called after idle connections were closed, such
> a connection ended up being open till closed by the client (or up to
> keepalive_time), delaying shutdown.
>
> With this change, such connections are allowed to serve just one request,
> much like it happens in HTTP/1.x, and closed afterwards.
>
> Reported by Kasei Wang,
> https://freenginx.org/pipermail/nginx-devel/2024-May/000277.html
>
> diff --git a/src/http/v2/ngx_http_v2.c b/src/http/v2/ngx_http_v2.c
> --- a/src/http/v2/ngx_http_v2.c
> +++ b/src/http/v2/ngx_http_v2.c
> @@ -1336,7 +1336,8 @@ ngx_http_v2_state_headers(ngx_http_v2_co
>      clcf = ngx_http_get_module_loc_conf(h2c->http_connection->conf_ctx,
>                                          ngx_http_core_module);
>
> -    if (clcf->keepalive_timeout == 0
> +    if (ngx_exiting
> +        || clcf->keepalive_timeout == 0
>          || h2c->connection->requests >= clcf->keepalive_requests
>          || ngx_current_msec - h2c->connection->start_time
>             > clcf->keepalive_time)
>
> --
> Maxim Dounin
> http://mdounin.ru/
> --
> nginx-devel mailing list
> nginx-devel at freenginx.org
> https://freenginx.org/mailman/listinfo/nginx-devel



More information about the nginx-devel mailing list