[PATCH] Upstream: max_fails doesn't respect the next_upstream config

Thu Oct 30 16:26:39 UTC 2025

Hello!

On Thu, Oct 30, 2025 at 02:48:54PM +0800, solomon wrote:

> # HG changeset patch
> # User Zhefeng Chen <logchen2009 at gmail.com>
> # Date 1761731339 -28800
> #      Wed Oct 29 17:48:59 2025 +0800
> # Node ID b81b918263d445c32cb2cff5e2724e3cac62dab3
> # Parent  c3be8460587196f376bccf29817c3a6523e18150
> Upstream: max_fails doesn't respect the next_upstream config
> 
> In the description of `max_fails`, it says:
> What is considered an unsuccessful attempt is defined by the
> `proxy_next_upstream`, `fastcgi_next_upstreami`,
> `uwsgi_next_upstream`, `scgi_next_upstream`,
> `memcached_next_upstream`, and `grpc_next_upstream` directives.
> 
> But actually 403 and 404 are always considered an successful
> attempt, while other cases are always considered an
> unsuccessful attempt.
> 
> The `ngx_http_upstream_free_round_robin_peer` function depends
> on this to update the health check state.

The documentation of the proxy_next_upstream directive (and other 
directives mentioned) further clarifies the behaviour 
(http://freenginx.org/r/proxy_next_upstream):

: The directive also defines what is considered an unsuccessful 
: attempt of communication with a server. The cases of error, 
: timeout and invalid_header are always considered unsuccessful 
: attempts, even if they are not specified in the directive. The 
: cases of http_500, http_502, http_503, http_504, and http_429 are 
: considered unsuccessful attempts only if they are specified in the 
: directive. The cases of http_403 and http_404 are never considered 
: unsuccessful attempts.

Note that error, timeout, and invalid_header cases are always 
considered unsuccessful (because they are, indeed, unsuccessful, 
and there isn't much to be done here).  This is what you probably 
mean by "other cases are always considered an unsuccessful 
attempt".  This doesn't apply to all other cases though - 
http_500, http_502, http_503, http_504, and http_429 are only 
considered unsuccessful if they are explicitly specified in the 
directive.  These are valid responses, but at the same time these 
are error responses, so depending on the particular use case 
[free]nginx can either accept them as is (and send to the client) 
or try to fallback to a different server.  See 
ngx_http_upstream_test_next() for details on where these codes are 
checked.

Note well that 403 and 404 are explicitly excluded from being 
considered unsuccessful attempts.  This behaviour is intentional 
and designed to support some common use cases, notably:

1. When resources are distributed among multiple servers, and 
"proxy_next_upstream http_404;" is used to iterate over available 
servers till the requested resource is found.  Switching off a 
server if the resource is not found in such a setup is not desired 
for obvious reasons.

2. Similarly, but with resources rsynced to all servers, so 
"proxy_next_upstream http_404 http_403;" makes it possible to 
fallback to other servers if the resource is not yet synced to a 
particular server (or the new directory is being synced, see 
5231:05c53652e7b4).

Could you please elaborate a bit more on why do you suggest to 
change this behaviour?  And how the mentioned use cases are 
expected to be handled with the new behaviour?  Is it intentional 
that error, timeout, and invalid_header cases won't be considered 
unsuccessful unless explicitly specified in the 
proxy_next_upstream directive with the proposed change?  
And invalid_header won't be considered unsuccessful by default?

[...]

-- 
Maxim Dounin
http://mdounin.ru/