[PATCH 1 of 3] Correctly calculate and set Age header

Hiroaki Nakamura hnakamur at gmail.com
Tue Jun 25 13:47:55 UTC 2024


Hello,

Thank you for your detailed review!

2024年6月24日(月) 9:02 Maxim Dounin <mdounin at mdounin.ru>:
>
> Hello!
>
> On Thu, Jun 20, 2024 at 08:39:43PM +0900, Hiroaki Nakamura wrote:

> Note that it might be a good idea to update the commit log to
> follow style rules, such as:
>
> : Cache: added calculation of the Age header.
> :
> : Implement the calculation of the Age header as specified in
> : RFC 9111 "HTTP Caching", https://www.rfc-editor.org/rfc/rfc9111.html

I have updated the commit message.

> > @@ -75,6 +77,8 @@
> >      time_t                           error_sec;
> >      time_t                           last_modified;
> >      time_t                           date;
> > +    time_t                           response_time;
>
> Note that the "response_time" field being added seems to be
> exactly equivalent to the exiting "date" field.

Yes, the "response_time" is equivalent to the "date" field when
receiving a response from the upstream.
However the 'date" is updated when sending a cached response.
We need to keep the cache creation time.
I have renamed "response_time" to "creation_time" to make it clearer.

> [diff]
> showfunc=1

I have added this config to my ~/.hgrc.

> Note that this uses data from the cache node, which is not
> available if the cache node was loaded from disk, for example, on
> server restart, and this will certainly lead to incorrect results.
> If I'm reading the code correctly, it will result in seconds since
> the Epoch in the Age header on all responses after a restart.
>
> This suggests that either things should be implemented quite
> differently, or this patch needs to be merged with the second one,
> which implements loading relevant information from the cache
> header.

I have merged this patch with the second one.

> > +    ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > +                   "http file cache send, resp:%O, resident:%d, age:%d",
> > +                   c->response_time, resident_time, current_age);

> Please also note that using "%d" for time_t is wrong, as time_t
> size might be different from the size of int.  For time_t, "%T"
> should be used instead.

I have changed to use "%T".

> > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_request.h
> > --- a/src/http/ngx_http_request.h       Thu Jun 20 20:26:24 2024 +0900
> > +++ b/src/http/ngx_http_request.h       Thu Jun 20 20:26:41 2024 +0900
> > @@ -290,6 +290,7 @@
> >      off_t                             content_offset;
> >      time_t                            date_time;
> >      time_t                            last_modified_time;
> > +    off_t                             age_n;
>
> Using off_t here looks wrong, as off_t is a type for file offsets,
> and not for dates.

I have changed the type of age_n to time_t.

> > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.c
> > --- a/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:24 2024 +0900
> > +++ b/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:41 2024 +0900
> > @@ -1549,6 +1559,7 @@
> >      ngx_memzero(u->state, sizeof(ngx_http_upstream_state_t));
> >
> >      u->start_time = ngx_current_msec;
> > +    u->request_time = ngx_time();
>
> Adding a duplicate start time shouldn't be needed.

It is not a duplicate since u->request_time is real clock time as opposed to
u->start_time is monotonic clock.

We need a real clock time of response_time since we calculate
the difference from the upstream date value.

Also we use the calculation response_time - request_time
when the upstream date is not present or not well synchronized (see below),
so we also need request_time to be a real clock time.

> > @@ -2529,6 +2541,8 @@
> >          return;
> >      }
> >
> > +    ngx_http_upstream_update_age(r, u, ngx_time());
>
> Note that the "ngx_time()" as an argument looks unneeded, it
> should be better to obtain the time in the function itself.

I have changed not to pass ngx_time() as an argument.

> Also, from semantic point of view this probably should be in
> ngx_http_upstream_process_headers(), and not a dedicated function
> call.

I have moved the code for
         * resident_time = now - response_time;
         * current_age = corrected_initial_age + resident_time;
from ngx_http_cache_send to ngx_http_upstream_process_headers.

As for ngx_http_upstream_update_age, I tried to move it into
ngx_http_upstream_process_headers(), but I could not make tests pass.
So I keep ngx_http_upstream_update_age to be called from
ngx_http_upstream_process_header and ngx_http_upstream_test_next.

> > @@ -2648,7 +2663,12 @@
> >              valid = ngx_http_file_cache_valid(u->conf->cache_valid,
> >                                                u->headers_in.status_n);
> >              if (valid) {
> > -                valid = now + valid;
> > +                ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> > +                               "adjust cache valid_sec:%O, "
> > +                               "valid:%O, init_age:%d for 304",
> > +                               now + valid - r->cache->corrected_initial_age,
> > +                               valid, r->cache->corrected_initial_age);
> > +                valid = now + valid - r->cache->corrected_initial_age;
>
> Not sure if Age calculations should apply to caching time as
> calculated with proxy_cache_valid directives, but if it does, it
> should apply consistently, in all cases where proxy_cache_valid
> times are used.  At least ngx_http_upstream_intercept_errors()
> case seems to be ignored in the patch.

I have updated ngx_http_upstream_intercept_errors as well.

> Also, if Age applies, there should be a way to ignore it,
> similarly to how Cache-Control can be ignored with the
> proxy_ignore_headers directive.

I do not know how to do this.
Maybe add
    { ngx_string("Age"), NGX_HTTP_UPSTREAM_IGN_AGE },
to ngx_http_upstream_ignore_headers_masks?

> > @@ -2672,6 +2692,59 @@
> >  }
> >
> >
> > +static void
> > +ngx_http_upstream_update_age(ngx_http_request_t *r, ngx_http_upstream_t *u,
> > +    time_t now)
> > +{
> > +    time_t  response_time, date, apparent_age, response_delay, age_value,
> > +            corrected_age_value, corrected_initial_age;
> > +
> > +    /*
> > +     * Update age response header.
> > +     * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age
> > +     */
> > +    response_time = now;
> > +    if (u->headers_in.date != NULL) {
> > +        date = ngx_parse_http_time(u->headers_in.date->value.data,
> > +                                   u->headers_in.date->value.len);
> > +        if (date == NGX_ERROR) {
> > +            date = now;
> > +        }
> > +    } else {
> > +        date = now;
> > +    }
> > +    apparent_age = ngx_max(0, response_time - date);
> > +
> > +    response_delay = response_time - u->request_time;
> > +    age_value = u->headers_in.age_n != -1 ? u->headers_in.age_n : 0;
> > +    corrected_age_value = age_value + response_delay;
> > +
> > +    corrected_initial_age = ngx_max(apparent_age, corrected_age_value);
> > +    r->headers_out.age_n = corrected_initial_age;
>
> Note that this approach, as described in RFC 9111 as a possible
> way to calculate the Age header, implies that time on both proxy
> server and the origin server must be "reasonably well
> synchronized", which is not really the case in many practical
> situations.
>
> If the time on the origin server is mostly arbitrary, for example,
> in the past, apparent_age might be very large, leading to a very
> large corrected_initial_age as well, preventing caching.
>
> As such, I would rather consider a more robust algorithm instead.

Yes, if the upstream date is too old, then apparant_age becomes too big.

How about using the following calculation?

if upstream date is between request_time and response_time, we assume
the upstream's clock is reasonably well synchronized and use

corrected_initial_age = age_value + (response_time - date);

if upstream date is not present or not between request_time
and response_time, we do not rely on the upstream date and use

corrected_initial_age = age_value + (response_time - u->request_time);

When sending a cached response, we calculate the Age
as specified in RFC 9111:

resident_time = now - response_time;
current_age = corrected_initial_age + resident_time;

where response_time is cache_creation_time.

I have updated my patchset to use the above calculation.

> > +#if (NGX_HTTP_CACHE)
> > +    if (r->cache) {
> > +        r->cache->response_time = response_time;
> > +        r->cache->corrected_initial_age = corrected_initial_age;
> > +        if (u->headers_in.adjusting_valid_sec) {
> > +            r->cache->valid_sec -= corrected_initial_age;
>
> The "adjusting_valid_sec" logic looks unclear (probably at least
> needs a better name) and fragile.  If I'm reading the code
> correctly, it will do the wrong thing at least if the upstream
> server returns something like:
>
> Cache-Control: max-age=60
> X-Accel-Expires: @<time>
>
> since u->headers_in.adjusting_valid_sec won't be cleared.

I found that we must ignore Expires when Cache-Control with max-age or s-maxage
is set according to https://www.rfc-editor.org/rfc/rfc9111#name-expires:

"If a response includes a Cache-Control header field with the max-age directive
(Section 5.2.2.1), a recipient MUST ignore the Expires header field. Likewise,
if a response includes the s-maxage directive (Section 5.2.2.10), a shared cache
recipient MUST ignore the Expires header field."

I have renamed adjusting_valid_sec to relative_freshness and added guard for
Expires and X-Accel-Expires: @<time> like below:

    /*
     * https://www.rfc-editor.org/rfc/rfc9111#name-expires
     *
     * If a response includes a Cache-Control header field with the max-age
     * directive (Section 5.2.2.1), a recipient MUST ignore the Expires
     * header field. Likewise, if a response includes the s-maxage directive
     * (Section 5.2.2.10), a shared cache recipient MUST ignore the Expires
     * header field.
     */
    if (!u->headers_in.relative_freshness) {
        r->cache->valid_sec = expires;
    }

> > +            ngx_log_debug2(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0,
> > +                           "http upstream adjusted cache "
> > +                           "valid_sec:%O, init_age:%O",
> > +                           r->cache->valid_sec, corrected_initial_age);
> > +        }
> > +    }
> > +#endif
>
> If I'm reading the code correctly, this will add the Age
> header to responses which are not cached.  This somewhat
> contradicts to what RFC 9111 says in the description of the Age
> header (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1-6):
>
> : The presence of an Age header field implies that the response was
> : not generated or validated by the origin server for this request.
>
> Please also note that in many cases [free]nginx is expected to be
> seen as the origin server, even if it caches some upstream server
> responses according to the configuration (in many cases, ignoring
> Cache-Control or Expires headers).  In this case using the
> Age header even on cached responses might not be desired.  It
> would be great to dig further into possible use cases and make
> sure all are properly covered.

I have changed to not add Age if the calculated value is zero and also
updated tests.

> > @@ -5319,6 +5397,39 @@
> >
> >
> >  static ngx_int_t
> > +ngx_http_upstream_process_age(ngx_http_request_t *r,
> > +    ngx_table_elt_t *h, ngx_uint_t offset)
> > +{
> > +    ngx_http_upstream_t  *u;
> > +
> > +    u = r->upstream;
> > +
> > +    if (u->headers_in.age) {
> > +        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
> > +                      "upstream sent duplicate header line: "%V: %V", "
> > +                      "previous value: "%V: %V"",
> > +                      &h->key, &h->value,
> > +                      &u->headers_in.age->key,
> > +                      &u->headers_in.age->value);
> > +        return NGX_HTTP_UPSTREAM_INVALID_HEADER;
> > +    }
> > +
> > +    h->next = NULL;
> > +    u->headers_in.age = h;
> > +    u->headers_in.age_n = ngx_atoof(h->value.data, h->value.len);
> > +
> > +    if (u->headers_in.age_n == NGX_ERROR) {
> > +        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
> > +                      "upstream sent invalid "Age" header: "
> > +                      ""%V: %V"", &h->key, &h->value);
> > +        return NGX_HTTP_UPSTREAM_INVALID_HEADER;
> > +    }
>
> Note that returning errors here violates RFC 9111 SHOULD
> recommendations, notably
> (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1):
>
> : Although it is defined as a singleton header field, a cache
> : encountering a message with a list-based Age field value SHOULD
> : use the first member of the field value, discarding subsequent
> : ones.
>
> : If the field value (after discarding additional members, as per
> : above) is invalid (e.g., it contains something other than a
> : non-negative integer), a cache SHOULD ignore the field.
>
> Unless there are good reasons, a better solution might be to
> report warnings and follow the recommendations instead.

I have changed the invalid Age value and duplicated Age headers
to report warnings.

> > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.h
> > --- a/src/http/ngx_http_upstream.h      Thu Jun 20 20:26:24 2024 +0900
> > +++ b/src/http/ngx_http_upstream.h      Thu Jun 20 20:26:41 2024 +0900
> > @@ -287,14 +287,17 @@
> >
> >      ngx_table_elt_t                 *cache_control;
> >      ngx_table_elt_t                 *set_cookie;
> > +    ngx_table_elt_t                 *age;
> >
> >      off_t                            content_length_n;
> >      time_t                           last_modified_time;
> > +    off_t                            age_n;
>
> See above, "off_t" looks wrong.

I have changed it to time_t.

I am sending updated two patches:

* [PATCH 1 of 2] Cache: added calculation of the Age header
* [PATCH 2 of 2] Tests: Update and add tests for Age header

Could you take another look?

Thank you,
Hiroaki Nakamura



More information about the nginx-devel mailing list