[PATCH 1 of 3] Correctly calculate and set Age header
Maxim Dounin
mdounin at mdounin.ru
Thu Jun 27 00:22:35 UTC 2024
Hello!
On Tue, Jun 25, 2024 at 10:47:55PM +0900, Hiroaki Nakamura wrote:
> 2024年6月24日(月) 9:02 Maxim Dounin <mdounin at mdounin.ru>:
> >
> > Hello!
> >
> > On Thu, Jun 20, 2024 at 08:39:43PM +0900, Hiroaki Nakamura wrote:
>
> > Note that it might be a good idea to update the commit log to
> > follow style rules, such as:
> >
> > : Cache: added calculation of the Age header.
> > :
> > : Implement the calculation of the Age header as specified in
> > : RFC 9111 "HTTP Caching", https://www.rfc-editor.org/rfc/rfc9111.html
>
> I have updated the commit message.
The commit message certainly needs more work. Still, you may want
to postpone it, as well as code modifications, before various
conceptual questions are cleaned up.
> > > @@ -75,6 +77,8 @@
> > > time_t error_sec;
> > > time_t last_modified;
> > > time_t date;
> > > + time_t response_time;
> >
> > Note that the "response_time" field being added seems to be
> > exactly equivalent to the exiting "date" field.
>
> Yes, the "response_time" is equivalent to the "date" field when
> receiving a response from the upstream.
> However the 'date" is updated when sending a cached response.
> We need to keep the cache creation time.
> I have renamed "response_time" to "creation_time" to make it clearer.
The "date" field is set when the cache item is created, and
can be updated on cache item revalidation (with
"proxy_cache_revalidate on"). It is not updated when sending a
cached response.
[...]
> > > + ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > > + "http file cache send, resp:%O, resident:%d, age:%d",
> > > + c->response_time, resident_time, current_age);
>
> > Please also note that using "%d" for time_t is wrong, as time_t
> > size might be different from the size of int. For time_t, "%T"
> > should be used instead.
>
> I have changed to use "%T".
Note that "too many debug logging" comment still applies.
[...]
> > > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.c
> > > --- a/src/http/ngx_http_upstream.c Thu Jun 20 20:26:24 2024 +0900
> > > +++ b/src/http/ngx_http_upstream.c Thu Jun 20 20:26:41 2024 +0900
> > > @@ -1549,6 +1559,7 @@
> > > ngx_memzero(u->state, sizeof(ngx_http_upstream_state_t));
> > >
> > > u->start_time = ngx_current_msec;
> > > + u->request_time = ngx_time();
> >
> > Adding a duplicate start time shouldn't be needed.
>
> It is not a duplicate since u->request_time is real clock time as opposed to
> u->start_time is monotonic clock.
>
> We need a real clock time of response_time since we calculate
> the difference from the upstream date value.
>
> Also we use the calculation response_time - request_time
> when the upstream date is not present or not well synchronized (see below),
> so we also need request_time to be a real clock time.
Actually, monotonic clock is much better suited for time
difference calculation, and that's exactly the reason why it is
used for u->start_time. And, since at any particular moment you
know u->start_time, current monotonic time and current wall clock
time, it is trivial to calculate wall clock start time if needed
(but I doubt it's actually needed, see below).
[...]
> > > @@ -2648,7 +2663,12 @@
> > > valid = ngx_http_file_cache_valid(u->conf->cache_valid,
> > > u->headers_in.status_n);
> > > if (valid) {
> > > - valid = now + valid;
> > > + ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > > + "adjust cache valid_sec:%O, "
> > > + "valid:%O, init_age:%d for 304",
> > > + now + valid - r->cache->corrected_initial_age,
> > > + valid, r->cache->corrected_initial_age);
> > > + valid = now + valid - r->cache->corrected_initial_age;
> >
> > Not sure if Age calculations should apply to caching time as
> > calculated with proxy_cache_valid directives, but if it does, it
> > should apply consistently, in all cases where proxy_cache_valid
> > times are used. At least ngx_http_upstream_intercept_errors()
> > case seems to be ignored in the patch.
>
> I have updated ngx_http_upstream_intercept_errors as well.
>
> > Also, if Age applies, there should be a way to ignore it,
> > similarly to how Cache-Control can be ignored with the
> > proxy_ignore_headers directive.
>
> I do not know how to do this.
> Maybe add
> { ngx_string("Age"), NGX_HTTP_UPSTREAM_IGN_AGE },
> to ngx_http_upstream_ignore_headers_masks?
If at all, yes, this probably should be implemented like this.
Still, for now I'm no convinced that Age needs to be taken into
account for proxy_cache_valid.
For example, consider a setup with two caching servers: an edge
cache, a large central cache, and an origin server. Assuming a
response without any Cache-Control headers, something like
proxy_cache_valid 200 1h;
on the edge cache, and
proxy_cache_valid 200 1d;
on the large cache.
With the suggested patch there will be Age header generated for
responses returned by the large cache, effectively preventing
caching on the edge cache.
While such (obviously suboptimal and not desired) behaviour can be
prevented by using "proxy_ignore_headers Age;" (assuming it is
implemented), this would require extra otherwise unneeded
configuration. Further, this will prevent proper use of the Age
header for Cache-Control-based caching for other requests.
A better approach might be to limit the Age header to
Cache-Control-based caching only. Not sure though, and it
certainly needs more analysis.
> > > @@ -2672,6 +2692,59 @@
> > > }
> > >
> > >
> > > +static void
> > > +ngx_http_upstream_update_age(ngx_http_request_t *r, ngx_http_upstream_t *u,
> > > + time_t now)
> > > +{
> > > + time_t response_time, date, apparent_age, response_delay, age_value,
> > > + corrected_age_value, corrected_initial_age;
> > > +
> > > + /*
> > > + * Update age response header.
> > > + * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age
> > > + */
> > > + response_time = now;
> > > + if (u->headers_in.date != NULL) {
> > > + date = ngx_parse_http_time(u->headers_in.date->value.data,
> > > + u->headers_in.date->value.len);
> > > + if (date == NGX_ERROR) {
> > > + date = now;
> > > + }
> > > + } else {
> > > + date = now;
> > > + }
> > > + apparent_age = ngx_max(0, response_time - date);
> > > +
> > > + response_delay = response_time - u->request_time;
> > > + age_value = u->headers_in.age_n != -1 ? u->headers_in.age_n : 0;
> > > + corrected_age_value = age_value + response_delay;
> > > +
> > > + corrected_initial_age = ngx_max(apparent_age, corrected_age_value);
> > > + r->headers_out.age_n = corrected_initial_age;
> >
> > Note that this approach, as described in RFC 9111 as a possible
> > way to calculate the Age header, implies that time on both proxy
> > server and the origin server must be "reasonably well
> > synchronized", which is not really the case in many practical
> > situations.
> >
> > If the time on the origin server is mostly arbitrary, for example,
> > in the past, apparent_age might be very large, leading to a very
> > large corrected_initial_age as well, preventing caching.
> >
> > As such, I would rather consider a more robust algorithm instead.
>
> Yes, if the upstream date is too old, then apparant_age becomes too big.
>
> How about using the following calculation?
>
> if upstream date is between request_time and response_time, we assume
> the upstream's clock is reasonably well synchronized and use
>
> corrected_initial_age = age_value + (response_time - date);
>
> if upstream date is not present or not between request_time
> and response_time, we do not rely on the upstream date and use
>
> corrected_initial_age = age_value + (response_time - u->request_time);
>
> When sending a cached response, we calculate the Age
> as specified in RFC 9111:
>
> resident_time = now - response_time;
> current_age = corrected_initial_age + resident_time;
>
> where response_time is cache_creation_time.
>
> I have updated my patchset to use the above calculation.
This looks slightly more reasonable. At least, this won't result
in obviously bogus age values if clocks are not synchronized.
Still, a simpler and mostly equivalent solution might be to avoid
the whole "corrected_initial_age" concept, and use just Age as is,
combined with the time passed since the cache item was
created/revalidated. The time in question is when the response
headers were received, and usually a good approximation of the
time when the response was actually generated.
> > > +#if (NGX_HTTP_CACHE)
> > > + if (r->cache) {
> > > + r->cache->response_time = response_time;
> > > + r->cache->corrected_initial_age = corrected_initial_age;
> > > + if (u->headers_in.adjusting_valid_sec) {
> > > + r->cache->valid_sec -= corrected_initial_age;
> >
> > The "adjusting_valid_sec" logic looks unclear (probably at least
> > needs a better name) and fragile. If I'm reading the code
> > correctly, it will do the wrong thing at least if the upstream
> > server returns something like:
> >
> > Cache-Control: max-age=60
> > X-Accel-Expires: @<time>
> >
> > since u->headers_in.adjusting_valid_sec won't be cleared.
>
> I found that we must ignore Expires when Cache-Control with max-age or s-maxage
> is set according to https://www.rfc-editor.org/rfc/rfc9111#name-expires:
>
> "If a response includes a Cache-Control header field with the max-age directive
> (Section 5.2.2.1), a recipient MUST ignore the Expires header field. Likewise,
> if a response includes the s-maxage directive (Section 5.2.2.10), a shared cache
> recipient MUST ignore the Expires header field."
The Expires header and X-Accel-Expires header are different
headers. The X-Accel-Expires header is a non-standard header
which takes precedence over both Expires and Cache-Control.
[...]
> > > + ngx_log_debug2(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0,
> > > + "http upstream adjusted cache "
> > > + "valid_sec:%O, init_age:%O",
> > > + r->cache->valid_sec, corrected_initial_age);
> > > + }
> > > + }
> > > +#endif
> >
> > If I'm reading the code correctly, this will add the Age
> > header to responses which are not cached. This somewhat
> > contradicts to what RFC 9111 says in the description of the Age
> > header (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1-6):
> >
> > : The presence of an Age header field implies that the response was
> > : not generated or validated by the origin server for this request.
> >
> > Please also note that in many cases [free]nginx is expected to be
> > seen as the origin server, even if it caches some upstream server
> > responses according to the configuration (in many cases, ignoring
> > Cache-Control or Expires headers). In this case using the
> > Age header even on cached responses might not be desired. It
> > would be great to dig further into possible use cases and make
> > sure all are properly covered.
>
> I have changed to not add Age if the calculated value is zero and also
> updated tests.
Note that this somewhat contradicts special handling of "Age: 0"
in HTTP/3 part of your patch.
Also, if I'm reading the patch correctly, as currently implemented
the code might result in non-zero Age even for directly returned
non-cached responses.
Also, "Age: 0" might legitimately appear for cached responses
during the first second after the response was cached. Not sure
if it needs to be returned, but given the about RFC quote it might
be usable for some.
Note well that the "would be great to dig further" comment still
applies.
[...]
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list