[PATCH 1 of 3] Correctly calculate and set Age header

Maxim Dounin mdounin at mdounin.ru
Thu Jun 27 00:22:35 UTC 2024


Hello!

On Tue, Jun 25, 2024 at 10:47:55PM +0900, Hiroaki Nakamura wrote:

> 2024年6月24日(月) 9:02 Maxim Dounin <mdounin at mdounin.ru>:
> >
> > Hello!
> >
> > On Thu, Jun 20, 2024 at 08:39:43PM +0900, Hiroaki Nakamura wrote:
> 
> > Note that it might be a good idea to update the commit log to
> > follow style rules, such as:
> >
> > : Cache: added calculation of the Age header.
> > :
> > : Implement the calculation of the Age header as specified in
> > : RFC 9111 "HTTP Caching", https://www.rfc-editor.org/rfc/rfc9111.html
> 
> I have updated the commit message.

The commit message certainly needs more work.  Still, you may want 
to postpone it, as well as code modifications, before various 
conceptual questions are cleaned up.

> > > @@ -75,6 +77,8 @@
> > >      time_t                           error_sec;
> > >      time_t                           last_modified;
> > >      time_t                           date;
> > > +    time_t                           response_time;
> >
> > Note that the "response_time" field being added seems to be
> > exactly equivalent to the exiting "date" field.
> 
> Yes, the "response_time" is equivalent to the "date" field when
> receiving a response from the upstream.
> However the 'date" is updated when sending a cached response.
> We need to keep the cache creation time.
> I have renamed "response_time" to "creation_time" to make it clearer.

The "date" field is set when the cache item is created, and 
can be updated on cache item revalidation (with 
"proxy_cache_revalidate on").  It is not updated when sending a 
cached response.

[...]

> > > +    ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > > +                   "http file cache send, resp:%O, resident:%d, age:%d",
> > > +                   c->response_time, resident_time, current_age);
> 
> > Please also note that using "%d" for time_t is wrong, as time_t
> > size might be different from the size of int.  For time_t, "%T"
> > should be used instead.
> 
> I have changed to use "%T".

Note that "too many debug logging" comment still applies.

[...]

> > > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.c
> > > --- a/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:24 2024 +0900
> > > +++ b/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:41 2024 +0900
> > > @@ -1549,6 +1559,7 @@
> > >      ngx_memzero(u->state, sizeof(ngx_http_upstream_state_t));
> > >
> > >      u->start_time = ngx_current_msec;
> > > +    u->request_time = ngx_time();
> >
> > Adding a duplicate start time shouldn't be needed.
> 
> It is not a duplicate since u->request_time is real clock time as opposed to
> u->start_time is monotonic clock.
> 
> We need a real clock time of response_time since we calculate
> the difference from the upstream date value.
> 
> Also we use the calculation response_time - request_time
> when the upstream date is not present or not well synchronized (see below),
> so we also need request_time to be a real clock time.

Actually, monotonic clock is much better suited for time 
difference calculation, and that's exactly the reason why it is 
used for u->start_time.  And, since at any particular moment you 
know u->start_time, current monotonic time and current wall clock 
time, it is trivial to calculate wall clock start time if needed 
(but I doubt it's actually needed, see below).

[...]

> > > @@ -2648,7 +2663,12 @@
> > >              valid = ngx_http_file_cache_valid(u->conf->cache_valid,
> > >                                                u->headers_in.status_n);
> > >              if (valid) {
> > > -                valid = now + valid;
> > > +                ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > > +                               "adjust cache valid_sec:%O, "
> > > +                               "valid:%O, init_age:%d for 304",
> > > +                               now + valid - r->cache->corrected_initial_age,
> > > +                               valid, r->cache->corrected_initial_age);
> > > +                valid = now + valid - r->cache->corrected_initial_age;
> >
> > Not sure if Age calculations should apply to caching time as
> > calculated with proxy_cache_valid directives, but if it does, it
> > should apply consistently, in all cases where proxy_cache_valid
> > times are used.  At least ngx_http_upstream_intercept_errors()
> > case seems to be ignored in the patch.
> 
> I have updated ngx_http_upstream_intercept_errors as well.
> 
> > Also, if Age applies, there should be a way to ignore it,
> > similarly to how Cache-Control can be ignored with the
> > proxy_ignore_headers directive.
> 
> I do not know how to do this.
> Maybe add
>     { ngx_string("Age"), NGX_HTTP_UPSTREAM_IGN_AGE },
> to ngx_http_upstream_ignore_headers_masks?

If at all, yes, this probably should be implemented like this.

Still, for now I'm no convinced that Age needs to be taken into 
account for proxy_cache_valid.

For example, consider a setup with two caching servers: an edge 
cache, a large central cache, and an origin server.  Assuming a 
response without any Cache-Control headers, something like

    proxy_cache_valid 200 1h;

on the edge cache, and

    proxy_cache_valid 200 1d;

on the large cache.

With the suggested patch there will be Age header generated for 
responses returned by the large cache, effectively preventing 
caching on the edge cache.

While such (obviously suboptimal and not desired) behaviour can be 
prevented by using "proxy_ignore_headers Age;" (assuming it is 
implemented), this would require extra otherwise unneeded 
configuration.  Further, this will prevent proper use of the Age 
header for Cache-Control-based caching for other requests.

A better approach might be to limit the Age header to 
Cache-Control-based caching only.  Not sure though, and it 
certainly needs more analysis.

> > > @@ -2672,6 +2692,59 @@
> > >  }
> > >
> > >
> > > +static void
> > > +ngx_http_upstream_update_age(ngx_http_request_t *r, ngx_http_upstream_t *u,
> > > +    time_t now)
> > > +{
> > > +    time_t  response_time, date, apparent_age, response_delay, age_value,
> > > +            corrected_age_value, corrected_initial_age;
> > > +
> > > +    /*
> > > +     * Update age response header.
> > > +     * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age
> > > +     */
> > > +    response_time = now;
> > > +    if (u->headers_in.date != NULL) {
> > > +        date = ngx_parse_http_time(u->headers_in.date->value.data,
> > > +                                   u->headers_in.date->value.len);
> > > +        if (date == NGX_ERROR) {
> > > +            date = now;
> > > +        }
> > > +    } else {
> > > +        date = now;
> > > +    }
> > > +    apparent_age = ngx_max(0, response_time - date);
> > > +
> > > +    response_delay = response_time - u->request_time;
> > > +    age_value = u->headers_in.age_n != -1 ? u->headers_in.age_n : 0;
> > > +    corrected_age_value = age_value + response_delay;
> > > +
> > > +    corrected_initial_age = ngx_max(apparent_age, corrected_age_value);
> > > +    r->headers_out.age_n = corrected_initial_age;
> >
> > Note that this approach, as described in RFC 9111 as a possible
> > way to calculate the Age header, implies that time on both proxy
> > server and the origin server must be "reasonably well
> > synchronized", which is not really the case in many practical
> > situations.
> >
> > If the time on the origin server is mostly arbitrary, for example,
> > in the past, apparent_age might be very large, leading to a very
> > large corrected_initial_age as well, preventing caching.
> >
> > As such, I would rather consider a more robust algorithm instead.
> 
> Yes, if the upstream date is too old, then apparant_age becomes too big.
> 
> How about using the following calculation?
> 
> if upstream date is between request_time and response_time, we assume
> the upstream's clock is reasonably well synchronized and use
> 
> corrected_initial_age = age_value + (response_time - date);
> 
> if upstream date is not present or not between request_time
> and response_time, we do not rely on the upstream date and use
> 
> corrected_initial_age = age_value + (response_time - u->request_time);
> 
> When sending a cached response, we calculate the Age
> as specified in RFC 9111:
> 
> resident_time = now - response_time;
> current_age = corrected_initial_age + resident_time;
> 
> where response_time is cache_creation_time.
> 
> I have updated my patchset to use the above calculation.

This looks slightly more reasonable.  At least, this won't result 
in obviously bogus age values if clocks are not synchronized.

Still, a simpler and mostly equivalent solution might be to avoid 
the whole "corrected_initial_age" concept, and use just Age as is, 
combined with the time passed since the cache item was 
created/revalidated.  The time in question is when the response 
headers were received, and usually a good approximation of the 
time when the response was actually generated.

> > > +#if (NGX_HTTP_CACHE)
> > > +    if (r->cache) {
> > > +        r->cache->response_time = response_time;
> > > +        r->cache->corrected_initial_age = corrected_initial_age;
> > > +        if (u->headers_in.adjusting_valid_sec) {
> > > +            r->cache->valid_sec -= corrected_initial_age;
> >
> > The "adjusting_valid_sec" logic looks unclear (probably at least
> > needs a better name) and fragile.  If I'm reading the code
> > correctly, it will do the wrong thing at least if the upstream
> > server returns something like:
> >
> > Cache-Control: max-age=60
> > X-Accel-Expires: @<time>
> >
> > since u->headers_in.adjusting_valid_sec won't be cleared.
> 
> I found that we must ignore Expires when Cache-Control with max-age or s-maxage
> is set according to https://www.rfc-editor.org/rfc/rfc9111#name-expires:
> 
> "If a response includes a Cache-Control header field with the max-age directive
> (Section 5.2.2.1), a recipient MUST ignore the Expires header field. Likewise,
> if a response includes the s-maxage directive (Section 5.2.2.10), a shared cache
> recipient MUST ignore the Expires header field."

The Expires header and X-Accel-Expires header are different 
headers.  The X-Accel-Expires header is a non-standard header 
which takes precedence over both Expires and Cache-Control.

[...]

> > > +            ngx_log_debug2(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0,
> > > +                           "http upstream adjusted cache "
> > > +                           "valid_sec:%O, init_age:%O",
> > > +                           r->cache->valid_sec, corrected_initial_age);
> > > +        }
> > > +    }
> > > +#endif
> >
> > If I'm reading the code correctly, this will add the Age
> > header to responses which are not cached.  This somewhat
> > contradicts to what RFC 9111 says in the description of the Age
> > header (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1-6):
> >
> > : The presence of an Age header field implies that the response was
> > : not generated or validated by the origin server for this request.
> >
> > Please also note that in many cases [free]nginx is expected to be
> > seen as the origin server, even if it caches some upstream server
> > responses according to the configuration (in many cases, ignoring
> > Cache-Control or Expires headers).  In this case using the
> > Age header even on cached responses might not be desired.  It
> > would be great to dig further into possible use cases and make
> > sure all are properly covered.
> 
> I have changed to not add Age if the calculated value is zero and also
> updated tests.

Note that this somewhat contradicts special handling of "Age: 0" 
in HTTP/3 part of your patch.

Also, if I'm reading the patch correctly, as currently implemented 
the code might result in non-zero Age even for directly returned 
non-cached responses.

Also, "Age: 0" might legitimately appear for cached responses 
during the first second after the response was cached.  Not sure 
if it needs to be returned, but given the about RFC quote it might 
be usable for some.

Note well that the "would be great to dig further" comment still 
applies.

[...]

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/



More information about the nginx-devel mailing list