Too many open files at 1000 req/sec
Zsolt Ero
zsolt.ero at gmail.com
Sun Aug 10 10:28:52 UTC 2025
Hi,
The peak was CPU 9%, IOWait, 4.7%, User: 2%, System: 2.65%
Ram is 6% used, 90% cached.
Zsolt
On 10. Aug 2025 at 08:32:10, Xiufeng Guo <showfom at gmail.com> wrote:
> Hi,
>
> What’s the average server load?
>
> Best Regards,
> Xiufeng Guo
>
>
> On Sun, Aug 10, 2025 at 02:04 Zsolt Ero <zsolt.ero at gmail.com> wrote:
>
>> Hi,
>>
>> I'm seeking advice on the most robust way to configure Nginx for a
>> specific scenario that led to a caching issue.
>>
>> I run a free vector tile map service (https://openfreemap.org/). The
>> server's primary job is to serve a massive number of small (~70 kB),
>> pre-gzipped PBF files.
>>
>> To optimize for ocean areas, tiles that don't exist on disk should be
>> served as a 200 OK with an empty body. These are then rendered as empty
>> space on the map.
>>
>> Recently, the server experienced an extremely high load: 100k req/sec on
>> Cloudflare, and 1k req/sec on my two Hetzner servers. During this peak,
>> Nginx started serving some *existing* tiles as empty bodies. Because
>> these responses included cache-friendly headers (expires 10y), the CDN
>> cached the incorrect empty responses, effectively making parts of the map
>> disappear until a manual cache purge was performed.
>>
>> My goal is to prevent this from happening again. A temporary server
>> overload should result in a server error (e.g., 5xx), not incorrect
>> content that gets permanently cached.
>>
>> The Nginx error logs clearly showed the root cause of the system error:
>>
>> 2025/08/08 23:08:16 [crit] 1084275#1084275: *161914910 open() "/mnt/ofm/planet-20250730_001001_pt/tiles/8/138/83.pbf" failed (24: Too many open files), client: 172.69.122.170, server: ...
>>
>> It appears my try_files directive interpreted this "Too many open files"
>> error as a "file not found" condition and fell back to serving the empty
>> tile.
>> System and Nginx Diagnostic Information
>>
>> Here is the relevant information about the system and Nginx process state
>> (captured at normal load, after I solved the high traffic incident, still
>> showing high FD usage on one worker).
>>
>> -
>>
>> *OS:* Ubuntu 22.04 LTS, 64 GB RAM, local NVME SSD, physical server
>> (not VPS)
>> -
>>
>> *nginx version*: nginx/1.27.4
>> -
>>
>> *Systemd ulimit for nofile:*
>>
>> # cat /etc/security/limits.d/limits1m.conf
>> - soft nofile 1048576
>> - hard nofile 1048576
>>
>> -
>>
>> *Nginx Worker Process Limits (worker_rlimit_nofile is set to 300000):*
>>
>> # for pid in $(pgrep -f "nginx: worker"); do sudo cat /proc/$pid/limits | grep "Max open files"; done
>> Max open files 300000 300000 files
>> Max open files 300000 300000 files
>> ... (all 8 workers show the same limit)
>>
>> -
>>
>> *Open File Descriptor Count per Worker:*
>>
>> # for pid in $(pgrep -f "nginx: worker"); do count=$(sudo lsof -p $pid 2>/dev/null | wc -l); echo "nginx PID $pid: $count open files"; done
>> nginx PID 1090: 57 open files
>> nginx PID 1091: 117 open files
>> nginx PID 1092: 931 open files
>> nginx PID 1093: 65027 open files
>> nginx PID 1094: 7449 open files
>> ...
>>
>> (Note the one worker with a very high count, ~98% of which are
>> regular files).
>> -
>>
>> sysctl fs.file-max:
>>
>> fs.file-max = 9223372036854775807
>>
>> -
>>
>> systemctl show nginx | grep LimitNOFILE:
>>
>> LimitNOFILE=524288
>> LimitNOFILESoft=1024
>>
>>
>> Relevant Nginx Configuration
>>
>> Here are the key parts of my configuration that led to the issue.
>>
>> worker_processes auto;
>> worker_rlimit_nofile 300000;
>>
>> events {
>> worker_connections 40000;
>> multi_accept on;
>> }
>>
>> http {
>> open_file_cache max=1000000 inactive=60m;
>> open_file_cache_valid 60m;
>> open_file_cache_min_uses 1;
>> open_file_cache_errors on;
>> # ...
>>
>> *server block tile serving logic:*
>>
>> location ^~ /monaco/20250806_231001_pt/ {
>> alias /mnt/ofm/monaco-20250806_231001_pt/tiles/;
>> try_files $uri @empty_tile;
>> add_header Content-Encoding gzip;
>>
>> expires 10y;
>>
>> types {
>> application/vnd.mapbox-vector-tile pbf;
>> }
>>
>> add_header 'Access-Control-Allow-Origin' '*' always;
>> <span role="presentation" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ;
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://freenginx.org/pipermail/nginx/attachments/20250810/fb33e01e/attachment.htm>
More information about the nginx
mailing list