Too many open files at 1000 req/sec

Zsolt Ero zsolt.ero at gmail.com
Sat Aug 9 16:57:40 UTC 2025


Hi,

I'm seeking advice on the most robust way to configure Nginx for a specific
scenario that led to a caching issue.

I run a free vector tile map service (https://openfreemap.org/). The
server's primary job is to serve a massive number of small (~70 kB),
pre-gzipped PBF files.

To optimize for ocean areas, tiles that don't exist on disk should be
served as a 200 OK with an empty body. These are then rendered as empty
space on the map.

Recently, the server experienced an extremely high load: 100k req/sec on
Cloudflare, and 1k req/sec on my two Hetzner servers. During this peak,
Nginx started serving some *existing* tiles as empty bodies. Because these
responses included cache-friendly headers (expires 10y), the CDN cached the
incorrect empty responses, effectively making parts of the map disappear
until a manual cache purge was performed.

My goal is to prevent this from happening again. A temporary server
overload should result in a server error (e.g., 5xx), not incorrect content
that gets permanently cached.

The Nginx error logs clearly showed the root cause of the system error:

2025/08/08 23:08:16 [crit] 1084275#1084275: *161914910 open()
"/mnt/ofm/planet-20250730_001001_pt/tiles/8/138/83.pbf" failed (24:
Too many open files), client: 172.69.122.170, server: ...

It appears my try_files directive interpreted this "Too many open files"
error as a "file not found" condition and fell back to serving the empty
tile.
System and Nginx Diagnostic Information

Here is the relevant information about the system and Nginx process state
(captured at normal load, after I solved the high traffic incident, still
showing high FD usage on one worker).

   -

   *OS:* Ubuntu 22.04 LTS, 64 GB RAM, local NVME SSD, physical server (not
   VPS)
   -

   *nginx version*: nginx/1.27.4
   -

   *Systemd ulimit for nofile:*

   # cat /etc/security/limits.d/limits1m.conf
   - soft nofile 1048576
   - hard nofile 1048576

   -

   *Nginx Worker Process Limits (worker_rlimit_nofile is set to 300000):*

   # for pid in $(pgrep -f "nginx: worker"); do sudo cat
/proc/$pid/limits | grep "Max open files"; done
   Max open files            300000               300000               files
   Max open files            300000               300000               files
   ... (all 8 workers show the same limit)

   -

   *Open File Descriptor Count per Worker:*

   # for pid in $(pgrep -f "nginx: worker"); do count=$(sudo lsof -p
$pid 2>/dev/null | wc -l); echo "nginx PID $pid: $count open files";
done
   nginx PID 1090: 57 open files
   nginx PID 1091: 117 open files
   nginx PID 1092: 931 open files
   nginx PID 1093: 65027 open files
   nginx PID 1094: 7449 open files
   ...

   (Note the one worker with a very high count, ~98% of which are regular
   files).
   -

   sysctl fs.file-max:

   fs.file-max = 9223372036854775807

   -

   systemctl show nginx | grep LimitNOFILE:

   LimitNOFILE=524288
   LimitNOFILESoft=1024


Relevant Nginx Configuration

Here are the key parts of my configuration that led to the issue.

worker_processes auto;
worker_rlimit_nofile 300000;
​
events {
    worker_connections 40000;
    multi_accept on;
}
​
http {
    open_file_cache max=1000000 inactive=60m;
    open_file_cache_valid 60m;
    open_file_cache_min_uses 1;
    open_file_cache_errors on;
    # ...

*server block tile serving logic:*

location ^~ /monaco/20250806_231001_pt/ {
    alias /mnt/ofm/monaco-20250806_231001_pt/tiles/;
    try_files $uri @empty_tile;
    add_header Content-Encoding gzip;
​
    expires 10y;
​
    types {
        application/vnd.mapbox-vector-tile pbf;
    }
​
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header Cache-Control public;
    add_header X-Robots-Tag "noindex, nofollow" always;
​
    add_header x-ofm-debug 'specific PBF monaco 20250806_231001_pt';
}
​
location @empty_tile {
    return 200 '';
​
    expires 10y;
​
    types {
        application/vnd.mapbox-vector-tile pbf;
    }
​
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header Cache-Control public;
    add_header X-Robots-Tag "noindex, nofollow" always;
​
    add_header x-ofm-debug 'empty tile';
}

Full generated config is uploaded here:
https://github.com/hyperknot/openfreemap/blob/main/docs/assets/nginx.conf
Questions

1. I think multi_accept + open_file_cache > worker_rlimit_nofile is causing
the whole trouble by not distributing the requests across workers, and then
reaching the limit. Can you confirm if this is the correct take?

2. How should I handle the "missing file should be empty response, server
error should be 5xx" scenario? I've asked 5 LLMs and each gave different
answers, which I'm including below. I'd like to ask your expert opinion,
and not trust LLMs in this regard.

*o3*

error_page 404 = @empty_tile;

*Gemini*

if (!-f $request_filename) {
    return 200 '';
}

*Opus*

location ^~ /{area}/{version}/ {
  # Check if file exists without opening it
  if (!-f $request_filename) {
      return 404;
  }
​
  # File exists, try to serve it
  try_files $uri =503;  # Return 503 if can't open (system error)
​
  add_header Content-Encoding gzip;
  expires 10y;
​
  types {
      application/vnd.mapbox-vector-tile pbf;
  }
​
  add_header 'Access-Control-Allow-Origin' '*' always;
  add_header Cache-Control public;
  add_header X-Robots-Tag "noindex, nofollow" always;
  add_header x-ofm-debug 'specific PBF {area} {version}';
}
​
# Handle 404s (file doesn't exist) - serve empty tile
error_page 404 = @empty_tile_safe;
​
# Handle 503s (system errors) - don't cache!
error_page 503 = @system_error;
​
location @empty_tile_safe {
    return 200 '';
​
    expires 10y;
​
    types {
        application/vnd.mapbox-vector-tile pbf;
    }
​
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header Cache-Control public;
    add_header X-Robots-Tag "noindex, nofollow" always;
    add_header x-ofm-debug 'empty tile (intentional)';
}
​
location @system_error {
    return 503 'Service temporarily unavailable';
​
    # SHORT cache for errors - don't poison the CDN cache!
    expires 5s;
​
    add_header 'Access-Control-Allow-Origin' '*' always;
    add_header Cache-Control "no-cache, must-revalidate";
    add_header Retry-After "5" always;
    add_header x-ofm-debug 'system error - temporary';
}

3. *open_file_cache Tuning:* My current open_file_cache settings are
clearly too aggressive and caused the problem. For a workload of millions
of tiny, static files, what would be considered a good configuration for max,
inactive, and min_uses?


4. *open_file_cache_errors:* Should this be on or off? My intent for having
it on was to cache the "not found" status for ocean tiles to reduce disk
checks. I want to cache file-not-found scenarios, but not server errors.
What is the correct usage in this context?


5. *Limits:* What values would you recommend for values like
worker_rlimit_nofile and worker_connections? Should I raise LimitNOFILESoft?


Finally, since this is the freenginx list: does freenginx offer anything
over stock nginx which would help me in this use case? Even just a
monitoring page with FD values would help.

Best regards, Zsolt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://freenginx.org/pipermail/nginx/attachments/20250809/add4f552/attachment-0001.htm>


More information about the nginx mailing list