[PATCH 1 of 3] Correctly calculate and set Age header

Maxim Dounin mdounin at mdounin.ru
Sun Jun 23 23:42:20 UTC 2024


Hello!

On Thu, Jun 20, 2024 at 08:39:43PM +0900, Hiroaki Nakamura wrote:

> # HG changeset patch
> # User Hiroaki Nakamura <hnakamur at gmail.com>
> # Date 1718882801 -32400
> #      Thu Jun 20 20:26:41 2024 +0900
> # Branch correct_age
> # Node ID c81df54e3d0333c26d4296792dc0df767b386f91
> # Parent  73929a4f3447d558747623884b5ba281c13332d8
> Correctly calculate and set Age header.
> Implement the calculation of the Age header as specified in
> "RFC 9111: HTTP Caching"
> https://www.rfc-editor.org/rfc/rfc9111.html

Thanks for the patches.

Note that it might be a good idea to update the commit log to 
follow style rules, such as:

: Cache: added calculation of the Age header.
: 
: Implement the calculation of the Age header as specified in
: RFC 9111 "HTTP Caching", https://www.rfc-editor.org/rfc/rfc9111.html

It also might be also a good idea to add additional implementation 
details which might worth explaining, especially given that RFC 
911 does not really specify how the Age header is to be 
calculated, but rather gives some guidelines on possible 
approaches.

See below for some more comments.

> 
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_cache.h
> --- a/src/http/ngx_http_cache.h Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_cache.h Thu Jun 20 20:26:41 2024 +0900
> @@ -59,6 +59,8 @@
>      size_t                           body_start;
>      off_t                            fs_size;
>      ngx_msec_t                       lock_time;
> +    time_t                           response_time;
> +    time_t                           corrected_initial_age;
>  } ngx_http_file_cache_node_t;
> 
> 
> @@ -75,6 +77,8 @@
>      time_t                           error_sec;
>      time_t                           last_modified;
>      time_t                           date;
> +    time_t                           response_time;

Note that the "response_time" field being added seems to be 
exactly equivalent to the exiting "date" field.

> +    time_t                           corrected_initial_age;
> 
>      ngx_str_t                        etag;
>      ngx_str_t                        vary;
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_file_cache.c
> --- a/src/http/ngx_http_file_cache.c    Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_file_cache.c    Thu Jun 20 20:26:41 2024 +0900
> @@ -971,6 +971,8 @@
>      fcn->uniq = 0;
>      fcn->body_start = 0;
>      fcn->fs_size = 0;
> +    fcn->response_time = 0;
> +    fcn->corrected_initial_age = 0;
> 
>  done:
> 
> @@ -980,6 +982,8 @@

Nitpicking: please consider adding

[diff]
showfunc=1

to your Mercurial configuration to ensure that function names are 
shown in diffs, this makes reviews easier.

> 
>      c->uniq = fcn->uniq;
>      c->error = fcn->error;
> +    c->response_time = fcn->response_time;
> +    c->corrected_initial_age = fcn->corrected_initial_age;
>      c->node = fcn;
> 
>  failed:
> @@ -1624,6 +1628,7 @@
>  ngx_int_t
>  ngx_http_cache_send(ngx_http_request_t *r)
>  {
> +    time_t             resident_time, current_age;
>      ngx_int_t          rc;
>      ngx_buf_t         *b;
>      ngx_chain_t        out;
> @@ -1646,6 +1651,17 @@
>          return NGX_HTTP_INTERNAL_SERVER_ERROR;
>      }
> 
> +    /*
> +     * Update age response header.
> +     * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age
> +     */
> +    resident_time = ngx_time() - c->response_time;
> +    current_age = c->corrected_initial_age + resident_time;
> +    r->headers_out.age_n = current_age;

Note that this uses data from the cache node, which is not 
available if the cache node was loaded from disk, for example, on 
server restart, and this will certainly lead to incorrect results.  
If I'm reading the code correctly, it will result in seconds since 
the Epoch in the Age header on all responses after a restart.

This suggests that either things should be implemented quite 
differently, or this patch needs to be merged with the second one, 
which implements loading relevant information from the cache 
header.

> +    ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> +                   "http file cache send, resp:%O, resident:%d, age:%d",
> +                   c->response_time, resident_time, current_age);

Note that there seems to be somewhat too many debug logging in the 
patch.

Please also note that using "%d" for time_t is wrong, as time_t 
size might be different from the size of int.  For time_t, "%T" 
should be used instead.

See ngx_sprintf() comment in src/core/ngx_string.c for the full 
list of supported formats and corresponding types.

> +
>      rc = ngx_http_send_header(r);
> 
>      if (rc == NGX_ERROR || rc > NGX_OK || r->header_only) {
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_header_filter_module.c
> --- a/src/http/ngx_http_header_filter_module.c  Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_header_filter_module.c  Thu Jun 20 20:26:41 2024 +0900
> @@ -322,6 +322,10 @@
>          len += sizeof("Last-Modified: Mon, 28 Sep 1970 06:00:00 GMT" CRLF) - 1;
>      }
> 
> +    if (r->headers_out.age_n != -1) {
> +        len += sizeof("Age: ") - 1 + NGX_OFF_T_LEN + 2;
> +    }
> +
>      c = r->connection;
> 
>      if (r->headers_out.location
> @@ -518,6 +522,10 @@
>          *b->last++ = CR; *b->last++ = LF;
>      }
> 
> +    if (r->headers_out.age_n != -1) {
> +        b->last = ngx_sprintf(b->last, "Age: %O" CRLF, r->headers_out.age_n);
> +    }
> +
>      if (host.data) {
> 
>          p = b->last + sizeof("Location: ") - 1;
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_request.c
> --- a/src/http/ngx_http_request.c       Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_request.c       Thu Jun 20 20:26:41 2024 +0900
> @@ -646,6 +646,7 @@
>      r->headers_in.keep_alive_n = -1;
>      r->headers_out.content_length_n = -1;
>      r->headers_out.last_modified_time = -1;
> +    r->headers_out.age_n = -1;
> 
>      r->uri_changes = NGX_HTTP_MAX_URI_CHANGES + 1;
>      r->subrequests = NGX_HTTP_MAX_SUBREQUESTS + 1;
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_request.h
> --- a/src/http/ngx_http_request.h       Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_request.h       Thu Jun 20 20:26:41 2024 +0900
> @@ -290,6 +290,7 @@
>      off_t                             content_offset;
>      time_t                            date_time;
>      time_t                            last_modified_time;
> +    off_t                             age_n;

Using off_t here looks wrong, as off_t is a type for file offsets, 
and not for dates.

(Not sure it needs to be here at all though, a better solution 
might be to rewrite/add the header within the upstream module.)

>  } ngx_http_headers_out_t;
> 
> 
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_special_response.c
> --- a/src/http/ngx_http_special_response.c      Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_special_response.c      Thu Jun 20 20:26:41 2024 +0900
> @@ -581,6 +581,7 @@
> 
>      r->headers_out.content_length_n = -1;
>      r->headers_out.last_modified_time = -1;
> +    r->headers_out.age_n = -1;
>  }
> 
> 
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.c
> --- a/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_upstream.c      Thu Jun 20 20:26:41 2024 +0900
> @@ -50,6 +50,8 @@
>      ngx_http_upstream_t *u);
>  static ngx_int_t ngx_http_upstream_test_next(ngx_http_request_t *r,
>      ngx_http_upstream_t *u);
> +static void ngx_http_upstream_update_age(ngx_http_request_t *r,
> +    ngx_http_upstream_t *u, time_t now);
>  static ngx_int_t ngx_http_upstream_intercept_errors(ngx_http_request_t *r,
>      ngx_http_upstream_t *u);
>  static ngx_int_t ngx_http_upstream_test_connect(ngx_connection_t *c);
> @@ -132,6 +134,8 @@
>      ngx_table_elt_t *h, ngx_uint_t offset);
>  static ngx_int_t ngx_http_upstream_process_vary(ngx_http_request_t *r,
>      ngx_table_elt_t *h, ngx_uint_t offset);
> +static ngx_int_t ngx_http_upstream_process_age(ngx_http_request_t *r,
> +    ngx_table_elt_t *h, ngx_uint_t offset);
>  static ngx_int_t ngx_http_upstream_copy_header_line(ngx_http_request_t *r,
>      ngx_table_elt_t *h, ngx_uint_t offset);
>  static ngx_int_t
> @@ -319,6 +323,10 @@
>                   ngx_http_upstream_copy_header_line,
>                   offsetof(ngx_http_headers_out_t, content_encoding), 0 },
> 
> +    { ngx_string("Age"),
> +                 ngx_http_upstream_process_age, 0,
> +                 ngx_http_upstream_ignore_header_line, 0, 0 },
> +
>      { ngx_null_string, NULL, 0, NULL, 0, 0 }
>  };
> 
> @@ -499,6 +507,7 @@
> 
>      u->headers_in.content_length_n = -1;
>      u->headers_in.last_modified_time = -1;
> +    u->headers_in.age_n = -1;
> 
>      return NGX_OK;
>  }
> @@ -1068,6 +1077,7 @@
>      ngx_memzero(&u->headers_in, sizeof(ngx_http_upstream_headers_in_t));
>      u->headers_in.content_length_n = -1;
>      u->headers_in.last_modified_time = -1;
> +    u->headers_in.age_n = -1;
> 
>      if (ngx_list_init(&u->headers_in.headers, r->pool, 8,
>                        sizeof(ngx_table_elt_t))
> @@ -1549,6 +1559,7 @@
>      ngx_memzero(u->state, sizeof(ngx_http_upstream_state_t));
> 
>      u->start_time = ngx_current_msec;
> +    u->request_time = ngx_time();

Adding a duplicate start time shouldn't be needed.

> 
>      u->state->response_time = (ngx_msec_t) -1;
>      u->state->connect_time = (ngx_msec_t) -1;
> @@ -2008,6 +2019,7 @@
>      ngx_memzero(&u->headers_in, sizeof(ngx_http_upstream_headers_in_t));
>      u->headers_in.content_length_n = -1;
>      u->headers_in.last_modified_time = -1;
> +    u->headers_in.age_n = -1;
> 
>      if (ngx_list_init(&u->headers_in.headers, r->pool, 8,
>                        sizeof(ngx_table_elt_t))
> @@ -2529,6 +2541,8 @@
>          return;
>      }
> 
> +    ngx_http_upstream_update_age(r, u, ngx_time());

Note that the "ngx_time()" as an argument looks unneeded, it 
should be better to obtain the time in the function itself.

Also, from semantic point of view this probably should be in 
ngx_http_upstream_process_headers(), and not a dedicated function 
call.

> +
>      ngx_http_upstream_send_response(r, u);
>  }
> 
> @@ -2615,6 +2629,7 @@
>                         "http upstream not modified");
> 
>          now = ngx_time();
> +        ngx_http_upstream_update_age(r, u, now);
> 
>          valid = r->cache->valid_sec;
>          updating = r->cache->updating_sec;
> @@ -2648,7 +2663,12 @@
>              valid = ngx_http_file_cache_valid(u->conf->cache_valid,
>                                                u->headers_in.status_n);
>              if (valid) {
> -                valid = now + valid;
> +                ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> +                               "adjust cache valid_sec:%O, "
> +                               "valid:%O, init_age:%d for 304",
> +                               now + valid - r->cache->corrected_initial_age,
> +                               valid, r->cache->corrected_initial_age);
> +                valid = now + valid - r->cache->corrected_initial_age;

Not sure if Age calculations should apply to caching time as 
calculated with proxy_cache_valid directives, but if it does, it 
should apply consistently, in all cases where proxy_cache_valid 
times are used.  At least ngx_http_upstream_intercept_errors() 
case seems to be ignored in the patch.

Also, if Age applies, there should be a way to ignore it, 
similarly to how Cache-Control can be ignored with the 
proxy_ignore_headers directive.

>              }
>          }
> 
> @@ -2672,6 +2692,59 @@
>  }
> 
> 
> +static void
> +ngx_http_upstream_update_age(ngx_http_request_t *r, ngx_http_upstream_t *u,
> +    time_t now)
> +{
> +    time_t  response_time, date, apparent_age, response_delay, age_value,
> +            corrected_age_value, corrected_initial_age;
> +
> +    /*
> +     * Update age response header.
> +     * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age
> +     */
> +    response_time = now;
> +    if (u->headers_in.date != NULL) {
> +        date = ngx_parse_http_time(u->headers_in.date->value.data,
> +                                   u->headers_in.date->value.len);
> +        if (date == NGX_ERROR) {
> +            date = now;
> +        }
> +    } else {
> +        date = now;
> +    }
> +    apparent_age = ngx_max(0, response_time - date);
> +
> +    response_delay = response_time - u->request_time;
> +    age_value = u->headers_in.age_n != -1 ? u->headers_in.age_n : 0;
> +    corrected_age_value = age_value + response_delay;
> +
> +    corrected_initial_age = ngx_max(apparent_age, corrected_age_value);
> +    r->headers_out.age_n = corrected_initial_age;

Note that this approach, as described in RFC 9111 as a possible 
way to calculate the Age header, implies that time on both proxy 
server and the origin server must be "reasonably well 
synchronized", which is not really the case in many practical 
situations.

If the time on the origin server is mostly arbitrary, for example, 
in the past, apparent_age might be very large, leading to a very 
large corrected_initial_age as well, preventing caching.

As such, I would rather consider a more robust algorithm instead.

> +
> +    ngx_log_debug8(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0,
> +                   "http upstream set age:%O, req:%O, resp:%O, date:%O, "
> +                   "a_age:%O, resp_delay:%O, u_age:%O, c_age:%O",
> +                   corrected_initial_age, u->request_time, response_time, date,
> +                   apparent_age, response_delay, u->headers_in.age_n,
> +                   corrected_age_value);
> +
> +#if (NGX_HTTP_CACHE)
> +    if (r->cache) {
> +        r->cache->response_time = response_time;
> +        r->cache->corrected_initial_age = corrected_initial_age;
> +        if (u->headers_in.adjusting_valid_sec) {
> +            r->cache->valid_sec -= corrected_initial_age;

The "adjusting_valid_sec" logic looks unclear (probably at least 
needs a better name) and fragile.  If I'm reading the code 
correctly, it will do the wrong thing at least if the upstream 
server returns something like:

Cache-Control: max-age=60
X-Accel-Expires: @<time>

since u->headers_in.adjusting_valid_sec won't be cleared.

> +            ngx_log_debug2(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0,
> +                           "http upstream adjusted cache "
> +                           "valid_sec:%O, init_age:%O",
> +                           r->cache->valid_sec, corrected_initial_age);
> +        }
> +    }
> +#endif

If I'm reading the code correctly, this will add the Age 
header to responses which are not cached.  This somewhat 
contradicts to what RFC 9111 says in the description of the Age 
header (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1-6):

: The presence of an Age header field implies that the response was 
: not generated or validated by the origin server for this request.

Please also note that in many cases [free]nginx is expected to be 
seen as the origin server, even if it caches some upstream server 
responses according to the configuration (in many cases, ignoring 
Cache-Control or Expires headers).  In this case using the 
Age header even on cached responses might not be desired.  It 
would be great to dig further into possible use cases and make 
sure all are properly covered.

> +}
> +
> +
>  static ngx_int_t
>  ngx_http_upstream_intercept_errors(ngx_http_request_t *r,
>      ngx_http_upstream_t *u)
> @@ -2747,6 +2820,7 @@
>                                                            status);
>                          if (valid) {
>                              r->cache->valid_sec = ngx_time() + valid;
> +                            u->headers_in.adjusting_valid_sec = 1;
>                          }
>                      }
> 
> @@ -2952,6 +3026,7 @@
>      r->headers_out.status_line = u->headers_in.status_line;
> 
>      r->headers_out.content_length_n = u->headers_in.content_length_n;
> +    r->headers_out.age_n = u->headers_in.age_n;
> 
>      r->disable_not_modified = !u->cacheable;
> 
> @@ -4616,6 +4691,7 @@
> 
>                  if (valid) {
>                      r->cache->valid_sec = ngx_time() + valid;
> +                    u->headers_in.adjusting_valid_sec = 1;
>                      r->cache->error = rc;
>                  }
>              }
> @@ -4891,6 +4967,7 @@
>          }
> 
>          r->cache->valid_sec = ngx_time() + n;
> +        u->headers_in.adjusting_valid_sec = 1;
>          u->headers_in.expired = 0;
>      }
> 
> @@ -5053,6 +5130,7 @@
> 
>          default:
>              r->cache->valid_sec = ngx_time() + n;
> +            u->headers_in.adjusting_valid_sec = 1;
>              u->headers_in.no_cache = 0;
>              u->headers_in.expired = 0;
>              return NGX_OK;
> @@ -5319,6 +5397,39 @@
> 
> 
>  static ngx_int_t
> +ngx_http_upstream_process_age(ngx_http_request_t *r,
> +    ngx_table_elt_t *h, ngx_uint_t offset)
> +{
> +    ngx_http_upstream_t  *u;
> +
> +    u = r->upstream;
> +
> +    if (u->headers_in.age) {
> +        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
> +                      "upstream sent duplicate header line: "%V: %V", "
> +                      "previous value: "%V: %V"",
> +                      &h->key, &h->value,
> +                      &u->headers_in.age->key,
> +                      &u->headers_in.age->value);
> +        return NGX_HTTP_UPSTREAM_INVALID_HEADER;
> +    }
> +
> +    h->next = NULL;
> +    u->headers_in.age = h;
> +    u->headers_in.age_n = ngx_atoof(h->value.data, h->value.len);
> +
> +    if (u->headers_in.age_n == NGX_ERROR) {
> +        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
> +                      "upstream sent invalid "Age" header: "
> +                      ""%V: %V"", &h->key, &h->value);
> +        return NGX_HTTP_UPSTREAM_INVALID_HEADER;
> +    }

Note that returning errors here violates RFC 9111 SHOULD 
recommendations, notably 
(https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1):

: Although it is defined as a singleton header field, a cache 
: encountering a message with a list-based Age field value SHOULD 
: use the first member of the field value, discarding subsequent 
: ones.

: If the field value (after discarding additional members, as per 
: above) is invalid (e.g., it contains something other than a 
: non-negative integer), a cache SHOULD ignore the field.

Unless there are good reasons, a better solution might be to 
report warnings and follow the recommendations instead.

> +
> +    return NGX_OK;
> +}
> +
> +
> +static ngx_int_t
>  ngx_http_upstream_copy_header_line(ngx_http_request_t *r, ngx_table_elt_t *h,
>      ngx_uint_t offset)
>  {
> diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.h
> --- a/src/http/ngx_http_upstream.h      Thu Jun 20 20:26:24 2024 +0900
> +++ b/src/http/ngx_http_upstream.h      Thu Jun 20 20:26:41 2024 +0900
> @@ -287,14 +287,17 @@
> 
>      ngx_table_elt_t                 *cache_control;
>      ngx_table_elt_t                 *set_cookie;
> +    ngx_table_elt_t                 *age;
> 
>      off_t                            content_length_n;
>      time_t                           last_modified_time;
> +    off_t                            age_n;

See above, "off_t" looks wrong.

[...]

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/



More information about the nginx-devel mailing list