[PATCH] Upstream: max_fails doesn't respect the next_upstream config
Maxim Dounin
mdounin at mdounin.ru
Sat Nov 1 06:49:24 UTC 2025
Hello!
On Fri, Oct 31, 2025 at 02:31:38PM +0800, solomon wrote:
> Thank you for your reply. It makes sense to me. I didn't intend to
> change the behavior. I thought the implementation doesn't
> match the docs and didn't notice the docs have clarified this
> behavior.
>
> Now I understand it better. For the cases of error, timeout, and
> invalid_header, ngx_http_upstream_next is called directly. Since
> they are always considered unsuccessful attempts, there is no
> need to check the ft_type against u->conf->next_upstream. For other
> cases, ngx_http_upstream_test_next has already checked the status
> against u->conf->next_upstream to determine whether to call
> ngx_http_upstream_next, so there is also no need to check again.
Ok, so you've just misread the docs, thanks for clarifying this.
Note that in most cases it is better to check the code as the
source of truth, and not docs, since docs might fail to document
things correctly. In contrast, the code defines how things work,
and often contains many additional informations, such as comments
explaining why the code works this way and commit logs explaining
details about various changes made in the past.
> I have another question though. When peer.free is called from
> ngx_http_upstream_next, the state argument is correctly passed.
> But for the last attempt that doesn't finally enter
> ngx_http_upstream_next, peer.free is called from
> ngx_http_upstream_finalize_request with the state being 0, in this case
> the last attempt is always considered successful. But in fact it may
> be unsuccessful. So I think the status should also be checked there
> in order to pass the correct state. I have created a PR for this. Could
> you have a look? Thanks.
> https://github.com/nginx/nginx/pull/959
That's a known edge case with existing peer.free() handling: for
http_500, http_502, http_503, http_504, and http_429 cases, where
proxy_next_upstream controls if the particular attempt is considered
to be successful or not, peer state reported to peer.free()
actually matches proxy_next_upstream status: if we've switched to
the next upstream, peer is reported as failed, and if we've
instead used the response, peer is reported as not failed.
>From formal point of view this can be seen as correct behaviour,
as the reported peer state matches what proxy_next_upstream
defines - with all the limitation it implies, such as not retrying
non-idempotent requests by default, limited number of tries and
total time.
On the other hand, it is more or less obvious that current
behaviour might be suboptimal in some cases, and it might be
desired to report peer as failed even if wasn't able to switch to
the next upstream for some reason.
Further, in some cases it might be desired to report peer as failed
even if we haven't even tried to switch to the next upstream, but,
for example, have proxy_cache_use_stale configured for a
particular error code.
At the same time, another limitation of proxy_next_upstream is
that it cannot handle errors and timeouts which happen after the
response header was sent. But we still can report peer as failed
due to this. This needs to be done with care though, as this
might be a major change for some setups (for example, if responses
are expected to be incomplete and/or to time out at some point).
Summing the above, I think that current behaviour might be
improved, but it needs to be carefully thought what exactly it
needs to do, and how.
As for the pull request F5 NGINX you've referenced, I don't think
that suggested change addresses the last attempt issue you've
mentioned, as after the error response (got for the last attempt)
will be sent to the client, the code will call
ngx_http_upstream_finalize_request() with NGX_OK. Note though
that I'm not F5 and cannot do anything with the pull request in
question.
[...]
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list