[PATCH] Upstream: max_fails doesn't respect the next_upstream config

Sat Nov 1 06:49:24 UTC 2025

Hello!

On Fri, Oct 31, 2025 at 02:31:38PM +0800, solomon wrote:

> Thank you for your reply. It makes sense to me. I didn't intend to
> change the behavior. I thought the implementation doesn't
> match the docs and didn't notice the docs have clarified this
> behavior.
> 
> Now I understand it better. For the cases of error, timeout, and
> invalid_header, ngx_http_upstream_next is called directly. Since
> they are always considered unsuccessful attempts, there is no
> need to check the ft_type against u->conf->next_upstream. For other
> cases, ngx_http_upstream_test_next has already checked the status
> against  u->conf->next_upstream to determine whether to call
> ngx_http_upstream_next, so there is also no need to check again.

Ok, so you've just misread the docs, thanks for clarifying this.

Note that in most cases it is better to check the code as the 
source of truth, and not docs, since docs might fail to document 
things correctly.  In contrast, the code defines how things work, 
and often contains many additional informations, such as comments 
explaining why the code works this way and commit logs explaining 
details about various changes made in the past.

> I have another question though. When peer.free is called from
> ngx_http_upstream_next, the state argument is correctly passed.
> But for the last attempt that doesn't finally enter
> ngx_http_upstream_next, peer.free is called from
> ngx_http_upstream_finalize_request with the state being 0, in this case
> the last attempt is always considered successful. But in fact it may
> be unsuccessful. So I think the status should also be checked there
> in order to pass the correct state. I have created a PR for this. Could
> you have a look? Thanks.
> https://github.com/nginx/nginx/pull/959

That's a known edge case with existing peer.free() handling: for 
http_500, http_502, http_503, http_504, and http_429 cases, where 
proxy_next_upstream controls if the particular attempt is considered 
to be successful or not, peer state reported to peer.free() 
actually matches proxy_next_upstream status: if we've switched to 
the next upstream, peer is reported as failed, and if we've 
instead used the response, peer is reported as not failed.

>From formal point of view this can be seen as correct behaviour, 
as the reported peer state matches what proxy_next_upstream 
defines - with all the limitation it implies, such as not retrying 
non-idempotent requests by default, limited number of tries and 
total time.

On the other hand, it is more or less obvious that current 
behaviour might be suboptimal in some cases, and it might be 
desired to report peer as failed even if wasn't able to switch to 
the next upstream for some reason.

Further, in some cases it might be desired to report peer as failed 
even if we haven't even tried to switch to the next upstream, but, 
for example, have proxy_cache_use_stale configured for a 
particular error code.

At the same time, another limitation of proxy_next_upstream is 
that it cannot handle errors and timeouts which happen after the 
response header was sent.  But we still can report peer as failed 
due to this.  This needs to be done with care though, as this 
might be a major change for some setups (for example, if responses 
are expected to be incomplete and/or to time out at some point).

Summing the above, I think that current behaviour might be 
improved, but it needs to be carefully thought what exactly it 
needs to do, and how.

As for the pull request F5 NGINX you've referenced, I don't think 
that suggested change addresses the last attempt issue you've 
mentioned, as after the error response (got for the last attempt) 
will be sent to the client, the code will call 
ngx_http_upstream_finalize_request() with NGX_OK.  Note though 
that I'm not F5 and cannot do anything with the pull request in 
question.

[...]

-- 
Maxim Dounin
http://mdounin.ru/