[PATCH] Upstream: max_fails doesn't respect the next_upstream config

Fri Oct 31 06:31:38 UTC 2025

Thank you for your reply. It makes sense to me. I didn't intend to
change the behavior. I thought the implementation doesn't
match the docs and didn't notice the docs have clarified this
behavior.

Now I understand it better. For the cases of error, timeout, and
invalid_header, ngx_http_upstream_next is called directly. Since
they are always considered unsuccessful attempts, there is no
need to check the ft_type against u->conf->next_upstream. For other
cases, ngx_http_upstream_test_next has already checked the status
against  u->conf->next_upstream to determine whether to call
ngx_http_upstream_next, so there is also no need to check again.

I have another question though. When peer.free is called from
ngx_http_upstream_next, the state argument is correctly passed.
But for the last attempt that doesn't finally enter
ngx_http_upstream_next, peer.free is called from
ngx_http_upstream_finalize_request with the state being 0, in this case
the last attempt is always considered successful. But in fact it may
be unsuccessful. So I think the status should also be checked there
in order to pass the correct state. I have created a PR for this. Could
you have a look? Thanks.
https://github.com/nginx/nginx/pull/959

Maxim Dounin <mdounin at mdounin.ru> 于2025年10月31日周五 00:46写道：

> Hello!
>
> On Thu, Oct 30, 2025 at 02:48:54PM +0800, solomon wrote:
>
> > # HG changeset patch
> > # User Zhefeng Chen <logchen2009 at gmail.com>
> > # Date 1761731339 -28800
> > #      Wed Oct 29 17:48:59 2025 +0800
> > # Node ID b81b918263d445c32cb2cff5e2724e3cac62dab3
> > # Parent  c3be8460587196f376bccf29817c3a6523e18150
> > Upstream: max_fails doesn't respect the next_upstream config
> >
> > In the description of `max_fails`, it says:
> > What is considered an unsuccessful attempt is defined by the
> > `proxy_next_upstream`, `fastcgi_next_upstreami`,
> > `uwsgi_next_upstream`, `scgi_next_upstream`,
> > `memcached_next_upstream`, and `grpc_next_upstream` directives.
> >
> > But actually 403 and 404 are always considered an successful
> > attempt, while other cases are always considered an
> > unsuccessful attempt.
> >
> > The `ngx_http_upstream_free_round_robin_peer` function depends
> > on this to update the health check state.
>
> The documentation of the proxy_next_upstream directive (and other
> directives mentioned) further clarifies the behaviour
> (http://freenginx.org/r/proxy_next_upstream):
>
> : The directive also defines what is considered an unsuccessful
> : attempt of communication with a server. The cases of error,
> : timeout and invalid_header are always considered unsuccessful
> : attempts, even if they are not specified in the directive. The
> : cases of http_500, http_502, http_503, http_504, and http_429 are
> : considered unsuccessful attempts only if they are specified in the
> : directive. The cases of http_403 and http_404 are never considered
> : unsuccessful attempts.
>
> Note that error, timeout, and invalid_header cases are always
> considered unsuccessful (because they are, indeed, unsuccessful,
> and there isn't much to be done here).  This is what you probably
> mean by "other cases are always considered an unsuccessful
> attempt".  This doesn't apply to all other cases though -
> http_500, http_502, http_503, http_504, and http_429 are only
> considered unsuccessful if they are explicitly specified in the
> directive.  These are valid responses, but at the same time these
> are error responses, so depending on the particular use case
> [free]nginx can either accept them as is (and send to the client)
> or try to fallback to a different server.  See
> ngx_http_upstream_test_next() for details on where these codes are
> checked.
>
> Note well that 403 and 404 are explicitly excluded from being
> considered unsuccessful attempts.  This behaviour is intentional
> and designed to support some common use cases, notably:
>
> 1. When resources are distributed among multiple servers, and
> "proxy_next_upstream http_404;" is used to iterate over available
> servers till the requested resource is found.  Switching off a
> server if the resource is not found in such a setup is not desired
> for obvious reasons.
>
> 2. Similarly, but with resources rsynced to all servers, so
> "proxy_next_upstream http_404 http_403;" makes it possible to
> fallback to other servers if the resource is not yet synced to a
> particular server (or the new directory is being synced, see
> 5231:05c53652e7b4).
>
> Could you please elaborate a bit more on why do you suggest to
> change this behaviour?  And how the mentioned use cases are
> expected to be handled with the new behaviour?  Is it intentional
> that error, timeout, and invalid_header cases won't be considered
> unsuccessful unless explicitly specified in the
> proxy_next_upstream directive with the proposed change?
> And invalid_header won't be considered unsuccessful by default?
>
> [...]
>
> --
> Maxim Dounin
> http://mdounin.ru/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://freenginx.org/pipermail/nginx-devel/attachments/20251031/b34082a4/attachment.htm>