<html><body><div dir="ltr">
Hello Maxim and thank you for your detailed answer.</div><div dir="ltr"><br></div><div dir="ltr"><div dir="ltr">First, about `multi_accept`: I can confirm that it indeed distributes requests super unevenly.</div><div dir="ltr">Luckily I have 2 servers, handling 50-50% of the requests, so I could experiment by turning it off on one and restarting the nginx service on both.</div><div dir="ltr"><br></div><div dir="ltr">multi_accept: on</div><div dir="ltr"><br></div><div dir="ltr">for pid in $(pgrep -f "nginx: worker"); do echo "PID $pid: $(lsof -p $pid | wc -l) open files"; done</div><div dir="ltr">PID 1761825: 66989 open files</div><div dir="ltr">PID 1761827: 8766 open files</div><div dir="ltr">PID 1761828: 962 open files</div><div dir="ltr">PID 1761830: 184 open files</div><div dir="ltr">PID 1761832: 46 open files</div><div dir="ltr">PID 1761833: 81 open files</div><div dir="ltr">PID 1761834: 47 open files</div><div dir="ltr">PID 1761835: 40 open files</div><div dir="ltr">PID 1761836: 45 open files</div><div dir="ltr">PID 1761837: 44 open files</div><div dir="ltr">PID 1761838: 40 open files</div><div dir="ltr">PID 1761839: 40 open files</div><div dir="ltr"><br></div><div dir="ltr">multi_accept: off</div><div dir="ltr">PID 1600658: 11137 open files</div><div dir="ltr">PID 1600659: 10988 open files</div><div dir="ltr">PID 1600660: 10974 open files</div><div dir="ltr">PID 1600661: 11116 open files</div><div dir="ltr">PID 1600662: 10937 open files</div><div dir="ltr">PID 1600663: 10891 open files</div><div dir="ltr">PID 1600664: 10934 open files</div><div dir="ltr">PID 1600665: 10944 open files</div><div dir="ltr"><br></div><div dir="ltr">This is from an everyday, low-load situation, not CDN "Purge Cache" or similar flood.</div><div dir="ltr"><br></div><div dir="ltr">Based on this, multi_accept: on clearly makes no sense, I wonder why it's written in so many guides. Back then, I went through blog posts/optimisation guides/StackOverflow before I ended up on the config I used. multi_accept: on was in many of them.</div><div dir="ltr"><br></div><div dir="ltr">2. Thanks, so I'll be rewriting it as</div><div dir="ltr">error_page 404 = @empty_tile;</div><div dir="ltr">log_not_found off;</div><div dir="ltr"><br></div><div dir="ltr">3. What I didn't say was that the files are from a read-only mounted disk-image (btrfs loop,ro 0 0). I guess modern Linux kernel level caching should be quite optimised for this scenario, shouldn't it? I believe open_file_cache in my situation introduces a huge complexity surface with possibly no upside? I'll be definitely turning it off altogether. </div><div dir="ltr"><br></div><div dir="ltr">4. Now for the limits/connections, I feel it's bit of a deeper water. Currently (at normal load) I have this on the multi accept off server:</div><div dir="ltr"><br></div><div dir="ltr"><div dir="ltr">ss -Htnp state established '( sport = :80 or sport = :443 )' \</div><div dir="ltr"> | awk 'match($0,/pid=([0-9]+)/,m){c[m[1]]++} END{for (p in c) printf "nginx worker pid %s: %d ESTAB\n", p, c[p]}' \</div><div dir="ltr"> | sort -k6,6nr</div><div dir="ltr">nginx worker pid 1600658: 203 ESTAB</div><div dir="ltr">nginx worker pid 1600659: 213 ESTAB</div><div dir="ltr">nginx worker pid 1600660: 211 ESTAB</div><div dir="ltr">nginx worker pid 1600661: 201 ESTAB</div><div dir="ltr">nginx worker pid 1600662: 220 ESTAB</div><div dir="ltr">nginx worker pid 1600663: 214 ESTAB</div><div dir="ltr">nginx worker pid 1600664: 213 ESTAB</div><div dir="ltr">nginx worker pid 1600665: 212 ESTAB</div><div dir="ltr"><br></div><div dir="ltr">and this on the multi accept on one:</div><div dir="ltr"><br></div><div dir="ltr"><div dir="ltr">nginx worker pid 1761825: 1388 ESTAB</div><div dir="ltr">nginx worker pid 1761827: 114 ESTAB</div><div dir="ltr">nginx worker pid 1761828: 6 ESTAB</div><div dir="ltr"><br></div><div dir="ltr">Isn't the default http2 128 streams a bit of a too high value? What do you think about</div><div dir="ltr"><br></div><div dir="ltr"><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">worker_connections 8192;</p>
<p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">http2_max_concurrent_streams 32;</p>
<p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">=> 8192 * (32+1) = 270,336 < 300k</p>
<p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">Also what do you think about adding http2_idle_timeout 30s;</p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal" dir="ltr"><font face="Helvetica Neue">Best regards,</font></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal" dir="ltr"><font face="Helvetica Neue">Zsolt </font></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal" dir="ltr"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal" dir="ltr"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p><p style="margin:0px;font-style:normal;font-variant-caps:normal;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><br></p></div></div><div dir="ltr"><br></div><div dir="ltr"><br></div></div><div dir="ltr"><br></div><div dir="ltr"><br></div><div dir="ltr"><br></div></div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On 10. Aug 2025 at 12:34:55, Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>> wrote:<br></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" type="cite">
<div>
<div>
Hello!<br><br>On Sat, Aug 09, 2025 at 09:57:40AM -0700, Zsolt Ero wrote:<br><br><blockquote type="cite"> I'm seeking advice on the most robust way to configure Nginx for a specific<br></blockquote><blockquote type="cite"> scenario that led to a caching issue.<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> I run a free vector tile map service (<a href="https://openfreemap.org/">https://openfreemap.org/</a>). The<br></blockquote><blockquote type="cite"> server's primary job is to serve a massive number of small (~70 kB),<br></blockquote><blockquote type="cite"> pre-gzipped PBF files.<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> To optimize for ocean areas, tiles that don't exist on disk should be<br></blockquote><blockquote type="cite"> served as a 200 OK with an empty body. These are then rendered as empty<br></blockquote><blockquote type="cite"> space on the map.<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> Recently, the server experienced an extremely high load: 100k req/sec on<br></blockquote><blockquote type="cite"> Cloudflare, and 1k req/sec on my two Hetzner servers. During this peak,<br></blockquote><blockquote type="cite"> Nginx started serving some *existing* tiles as empty bodies. Because these<br></blockquote><blockquote type="cite"> responses included cache-friendly headers (expires 10y), the CDN cached the<br></blockquote><blockquote type="cite"> incorrect empty responses, effectively making parts of the map disappear<br></blockquote><blockquote type="cite"> until a manual cache purge was performed.<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> My goal is to prevent this from happening again. A temporary server<br></blockquote><blockquote type="cite"> overload should result in a server error (e.g., 5xx), not incorrect content<br></blockquote><blockquote type="cite"> that gets permanently cached.<br></blockquote><br>[...]<br><br><blockquote type="cite"> Full generated config is uploaded here:<br></blockquote><blockquote type="cite"> <a href="https://github.com/hyperknot/openfreemap/blob/main/docs/assets/nginx.conf">https://github.com/hyperknot/openfreemap/blob/main/docs/assets/nginx.conf</a><br></blockquote><blockquote type="cite"> Questions<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> 1. I think multi_accept + open_file_cache > worker_rlimit_nofile is causing<br></blockquote><blockquote type="cite"> the whole trouble by not distributing the requests across workers, and then<br></blockquote><blockquote type="cite"> reaching the limit. Can you confirm if this is the correct take?<br></blockquote><br>The root cause is definitely open_file_cache configured <br>with maximum number of cached files higher than allowed by the <br>number of open files resource limit.<br><br>Using multi_accept makes this easier to hit by making request <br>distribution between worker processes worse than it could be.<br><br>Overall, I would recommend to:<br><br>- Remove multi_accept, it's not needed unless you have very high <br>connection rates (and with small connection rates it'll waste <br>resources). Even assuming 1k r/s translates to 1k connections per <br>second, using multi_accept is unlikely to be beneficial.<br><br>- Remove open_file_cache. It is only beneficial if opening files <br>requires significant resources, and this is unlikely for local <br>files on Unix systems. On the other hand, it is very likely to <br>introduce various issues, either by itself due to bugs (e.g., I've <br>recently fixed several open_file_cache bugs related to caching <br>files with directio enabled), or by exposing and magnifying other <br>issues, such as non-atomic file updates or misconfigurations like <br>this one.<br><br><blockquote type="cite"> 2. How should I handle the "missing file should be empty response, server<br></blockquote><blockquote type="cite"> error should be 5xx" scenario? I've asked 5 LLMs and each gave different<br></blockquote><blockquote type="cite"> answers, which I'm including below. I'd like to ask your expert opinion,<br></blockquote><blockquote type="cite"> and not trust LLMs in this regard.<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> *o3*<br></blockquote><blockquote type="cite"> <br></blockquote><blockquote type="cite"> error_page 404 = @empty_tile;<br></blockquote><br>That's what I would recommend as well.<br><br>You may also want to use "log_not_found off;" to avoid excessive <br>logging.<br><br>Also, it may make sense to actually rethink how empty tiles are <br>stored. With "no file means empty title" approach you are still <br>risking the same issue even with "error_page 404" if files will be <br>lost somehow - such as due to disk issues, during incomplete <br>synchronization, or whatever.<br><br>[...]<br><br><blockquote type="cite"> 3. *open_file_cache Tuning:* My current open_file_cache settings are<br></blockquote><blockquote type="cite"> clearly too aggressive and caused the problem. For a workload of millions<br></blockquote><blockquote type="cite"> of tiny, static files, what would be considered a good configuration for max,<br></blockquote><blockquote type="cite"> inactive, and min_uses?<br></blockquote><br>I don't think that open_file_cache would be beneficial for your <br>use case. Rather, it may make sense to tune OS namei(9) cache <br>(dentry cache on Linux; not sure there are any settings other than <br>vm.vfs_cache_pressure) to the number of files. On the other hand, <br>given the 1k r/s request rate, most systems should be good enough <br>without any tuning.<br><br><blockquote type="cite"> 4. *open_file_cache_errors:* Should this be on or off? My intent for having<br></blockquote><blockquote type="cite"> it on was to cache the "not found" status for ocean tiles to reduce disk<br></blockquote><blockquote type="cite"> checks. I want to cache file-not-found scenarios, but not server errors.<br></blockquote><blockquote type="cite"> What is the correct usage in this context?<br></blockquote><br>The "open_file_cache_errors" directive currently caches all file <br>system errors, and doesn't make any distinction between what <br>exactly gone wrong - either the file or directory cannot be found, <br>or there is a permissions error, or something else. If you want <br>to make sure that no unexpected errors will be cached, consider <br>keeping it off.<br><br>On the other hand, it may make sense to explicitly exclude EMFILE, <br>ENFILE, and may be ENOMEM from caching. I'll take a look.<br><br>Note though, that as suggested above, my recommendation would be <br>to avoid using "open_file_cache" at all.<br><br><blockquote type="cite"> 5. *Limits:* What values would you recommend for values like<br></blockquote><blockquote type="cite"> worker_rlimit_nofile and worker_connections? Should I raise LimitNOFILESoft?<br></blockquote><br>In general, "worker_connections" should be set depending on the <br>expected load (and "worker_processes"). Total number of <br>connections nginx will be able to handle is worker_processes * <br>worker_connections.<br><br>Given you use "worker_connections 40000;" and at least 5 worker <br>processes, your server is already able to handle more than 200k <br>connections, and it is likely more than enough. Looking into <br>stub_status numbers (and/or system connections stats) might give <br>you an idea if you needed more connections. Note that using <br>many worker connection might require OS tuning (but it looks like <br>you've already set fs.file-max to an arbitrary high value).<br><br>And the RLIMIT_NOFILE limit should be set to a value needed for <br>your worker processes. It doesn't matter how do you set it, <br>either in system (such as with LimitNOFILESoft in systemd) or with <br>worker_rlimit_nofile in nginx itself (assuming it's under the hard <br>limit set in the system).<br><br>The basic idea is that worker processes shouldn't hit the <br>RLIMIT_NOFILE limit, but should hit worker_connections limit <br>instead. This way workers will be able to reuse least recently <br>used connections to free some resources, and will be able to <br>actively avoid accepting new connections to let other worker <br>processes do this.<br><br>Given each connection uses at least one file for the socket, and <br>can use many (for the file it returns, for upstream connections, <br>for various temporary files, subrequests, streams in HTTP/2, and <br>so on), it is usually a good idea to keep RLIMIT_NOFILE several <br>times higher than worker_connections.<br><br>Since you have HTTP/2 enabled with the default <br>max_concurrent_streams (128), and no proxying or subrequests, a <br>reasonable limit would be worker_connections * (128 + 1) or so, <br>that's about 5 mln open files (or you could consider reducing <br>max_concurrent_streams, or worker_connections, or both). And of <br>course you'll have to add some for various files not related to <br>connections, such as logs and open_file_cache if you'll decide to <br>keep it.<br><br><blockquote type="cite"> Finally, since this is the freenginx list: does freenginx offer anything<br></blockquote><blockquote type="cite"> over stock nginx which would help me in this use case? Even just a<br></blockquote><blockquote type="cite"> monitoring page with FD values would help.<br></blockquote><br>I don't think there is a significant difference in this particular <br>use case. While freenginx provides various fixes and <br>improvements, including fixes in open_file_cache, they won't make <br>a difference here - the root cause of the issue you've hit is <br>fragile configuration combined with too low resource limits.<br><br>-- <br>Maxim Dounin<br><a href="http://mdounin.ru/">http://mdounin.ru/</a><br>
</div>
</div>
</blockquote>
</div></body></html>