From mdounin at mdounin.ru Tue Sep 23 16:18:03 2025 From: mdounin at mdounin.ru (Maxim Dounin) Date: Tue, 23 Sep 2025 19:18:03 +0300 Subject: freenginx-1.29.2 Message-ID: Changes with freenginx 1.29.2 23 Sep 2025 *) Feature: support for the Encrypted Client Hello (ECH) extension of the TLS 1.3 protocol. -- Maxim Dounin http://freenginx.org/ From paul at stormy.ca Fri Sep 26 21:31:58 2025 From: paul at stormy.ca (Paul) Date: Fri, 26 Sep 2025 17:31:58 -0400 Subject: Using 444 In-Reply-To: References: Message-ID: On 8/28/25 21:36, Brett Cooper wrote: /.../ > If that server block is only serving Perl /.../ it might be best to simply use > the following for the Perl server configuration block: > > location ~ \.php$ { ?return 444; } > > /.../ you could also configure this within the overall > server {} block: > > if ($http_user_agent = "") { return 444; } Many thanks. I am currently (a bit "hit and miss") using : proxy_buffering on; # maybe helps proxied apache2 ? connection_pool_size 512; client_header_buffer_size 512; large_client_header_buffers 4 512; location ~ \.php$ { return 444; } if ($http_user_agent = "") { return 444; } But the $http_user_agent often 'appears' to be, e.g.: 66.249.69.8 - - [26/Sep/2025:21:13:20 +0000] "GET /cgi-bin/whatever" 200 3672 "-" "Mozilla...)" Note the "-" which doesn't get a 444, Tried ($http_user_agent = (""|"-")) but #nginx -t is not happy. Tnx and br, Paul > > Regards, > Brett > > > ------ Original Message ------ > From "Paul" > > To nginx at freenginx.org > Date 08/28/2025 07:13:26 P > Subject Using 444 > >> I'm looking for advice, please. Using Nginx v1.18.0 (Ubuntu) which is >> "old" but security updated by Canonical, rock solid and very fast, for >> several static html sites and as proxy to a couple of other sites >> using python or perl. Total ~250k requests/day >> Recently logs have started showing ~10k php requests in rapid bursts. >> On a proxy to a perl box, this is a serious slow down >> I've added the following, appears to work well >> location ~ \.php$ { >> if ($request_method = GET) { >> return 444; # Drop >> } >> } >> I'm considering editing to ^(GET|HEAD|POST)$) { >> Any thoughts, downsides, recommendations? >> Tnx and warmest regards to all, >> Paul \\\||// (@ @) ooO_(_)_Ooo__________________________________ |______|_____|_____|_____|_____|_____|_____|_____| |___|____|_____|_____|_____|_____|_____|_____|____| |_____|_____| mailto:paul at stormy.ca _|____|____| From mdounin at mdounin.ru Sat Sep 27 07:08:29 2025 From: mdounin at mdounin.ru (Maxim Dounin) Date: Sat, 27 Sep 2025 10:08:29 +0300 Subject: Using 444 In-Reply-To: References: Message-ID: Hello! On Fri, Sep 26, 2025 at 05:31:58PM -0400, Paul wrote: > On 8/28/25 21:36, Brett Cooper wrote: > /.../ > > If that server block is only serving Perl /.../ it might be best to simply use > > the following for the Perl server configuration block: > > > > location ~ \.php$ { ?return 444; } > > > > /.../ you could also configure this within the overall server {} block: > > > > if ($http_user_agent = "") { return 444; } > > Many thanks. > > I am currently (a bit "hit and miss") using : > > proxy_buffering on; # maybe helps proxied apache2 ? Proxy buffering is on by default (see http://freenginx.org/r/proxy_buffering), so there is no need to switch it on unless you've switched it off at previous configuration levels. > connection_pool_size 512; > client_header_buffer_size 512; > large_client_header_buffers 4 512; Similarly, I would rather use the default values unless you understand why you want to change these. > location ~ \.php$ { return 444; } > if ($http_user_agent = "") { return 444; } > > But the $http_user_agent often 'appears' to be, e.g.: > > 66.249.69.8 - - [26/Sep/2025:21:13:20 +0000] "GET /cgi-bin/whatever" 200 > 3672 "-" "Mozilla...)" > > Note the "-" which doesn't get a 444, Assuming the default log_format (https://freenginx.org/r/log_format), the "-" here is from $http_referer, so it is not expected to get 444. > Tried ($http_user_agent = (""|"-")) but #nginx -t is not happy. There is no need to, as "-" is usually constructed by logging when the particular header is not present in the request. Quoting the documentation as linked above: : If the variable value is not found, a hyphen (?-?) will be : logged. But if you really want to, there two basic options: 1. Use two if's, that is: if ($http_user_agent = "") { return 444; } if ($http_user_agent = "-") { return 444; } 2. Use a regular expression. Something like this should work: if ($http_user_agent ~ "^(|-)$") { return 444; } Also, depending on the traffic pattern you are seeing, it might be a good idea to configure limit_req / limit_conn with appropriate limits. -- Maxim Dounin http://mdounin.ru/ From paul at stormy.ca Sat Sep 27 18:28:11 2025 From: paul at stormy.ca (Paul) Date: Sat, 27 Sep 2025 14:28:11 -0400 Subject: Using 444 In-Reply-To: References: Message-ID: On 9/27/25 03:08, Maxim Dounin wrote: > Hello! Maxim, many thanks. Currently battling a DDoS including out of control "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage rarely above 1%) but proxied apache2 often runs up to 98% across 12 cores (complex cgi needs 20-40 ms per response.) I'm attempting to mitigate. Your advice appreciated. I've "snipped" below for readability: [snip] >> I am currently (a bit "hit and miss") using : >> >> proxy_buffering on; # maybe helps proxied apache2 ? > > Proxy buffering is on by default (see > http://freenginx.org/r/proxy_buffering), so there is no need to > switch it on unless you've switched it off at previous > configuration levels. Understood, thanks -- I had two lines (rem'd in or out for testing purposes) trying to respect genuine requests from regular users. Given that nginx has a lot of spare capacity, could this be better tuned to alleviate the load on the back end? I've read your doc, but in a production environment, I'm unsure of the implications of "proxy_buffers number size;" and "proxy_busy_buffers_size size;" > >> connection_pool_size 512; >> client_header_buffer_size 512; >> large_client_header_buffers 4 512; > > Similarly, I would rather use the default values unless you > understand why you want to change these. Maybe mistakenly, I was trying to eliminate stupidly artificial cgi requests -- "GET /cgi-bin/....." that ran several kilobytes long. The backend apache could "swallow" them (normally a 404) but I was trying to eliminate the overhead. > >> location ~ \.php$ { return 444; } You did not mention this, but it does not appear to work well. access.log today gives hundreds of: 104.46.211.169 - - [27/Sep/2025:12:32:12 +0000] "GET /zhidagen.php HTTP/1.1" 404 5013 "-" "-" and the 5013 bytes is our "404-solr-try-again" page, not the 444 expected. >> if ($http_user_agent = "") { return 444; } /.../ >> Note the "-" which doesn't get a 444, /.../ > But if you really want to, there two basic options:/.../ Thanks. This was a previous suggestion on this list -- many/most malicious actors don't give a $http_user_agent -- I'll play with it.... > > Also, depending on the traffic pattern you are seeing, it might be > a good idea to configure limit_req / limit_conn with appropriate > limits. Again thanks, I had tried various 'location' lines such as limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s; limit_req zone=mylimit burst=5 nodelay; without success... obviously haven't fully understood Truly appreciate your assistance, tnx and br Paul > \\\||// (@ @) ooO_(_)_Ooo__________________________________ |______|_____|_____|_____|_____|_____|_____|_____| |___|____|_____|_____|_____|_____|_____|_____|____| |_____|_____| mailto:paul at stormy.ca _|____|____| From bernard+freenginx at rosset.net Sat Sep 27 18:44:39 2025 From: bernard+freenginx at rosset.net (Bernard Rosset) Date: Sat, 27 Sep 2025 20:44:39 +0200 Subject: Using 444 In-Reply-To: References: Message-ID: <198ceebb-1112-4ed7-b46a-22c8060ad10a@rosset.net> > Again thanks, I had tried various 'location' lines such as > ????limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s; > ????limit_req zone=mylimit burst=5 nodelay; > > without success... obviously haven't fully understood I would suggest to read https://freenginx.org/en/docs/http/ngx_http_limit_req_module.html again; sometimes details only "click" after on a n-th read. You mentioned 250k requests/day, but you did not characterise the population spread. My concern there would be if you 5 mebibytes storage is enough to handle all the IP addresses you're trying to rate-limit: per documentation (calculus details in there), one mebibyte stores either 16k IPv4 or 8k IPv6. Overflow is dealt with LRU. -- Bernard Rosset https://rosset.net/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4736 bytes Desc: S/MIME Cryptographic Signature URL: From noloader at gmail.com Sat Sep 27 21:45:25 2025 From: noloader at gmail.com (Jeffrey Walton) Date: Sat, 27 Sep 2025 17:45:25 -0400 Subject: Using 444 In-Reply-To: References: Message-ID: On Sat, Sep 27, 2025 at 2:28?PM Paul wrote: > > [...] > Maxim, many thanks. Currently battling a DDoS including out of control > "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage > rarely above 1%) but proxied apache2 often runs up to 98% across 12 > cores (complex cgi needs 20-40 ms per response.) > > I'm attempting to mitigate. Your advice appreciated. I've "snipped" > below for readability: My apologies if this wanders too off-topic. A lot of folks are having trouble due to AI Agents scraping their sites for training data. It hit the folks at GNU particularly hard. If AI is so smart, then why does it not clone a project instead of scraping source code presented as web pages??? You might consider putting a box on the front-end to handle the abuse from AI agents. Anibus, go-away and several others are popular. go-away provides a list of similar projects at . In fact, go-away names Nginx's ngx_http_js_challenge_module as a mitigation for the problem. Jeff From paul at stormy.ca Sat Sep 27 22:28:15 2025 From: paul at stormy.ca (Paul) Date: Sat, 27 Sep 2025 18:28:15 -0400 Subject: Using 444 In-Reply-To: <198ceebb-1112-4ed7-b46a-22c8060ad10a@rosset.net> References: <198ceebb-1112-4ed7-b46a-22c8060ad10a@rosset.net> Message-ID: On 9/27/25 14:44, Bernard Rosset via nginx wrote: >> Again thanks, I had tried various 'location' lines such as >> ?????limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s; >> ?????limit_req zone=mylimit burst=5 nodelay; >> >> without success... obviously haven't fully understood > > I would suggest to read https://freenginx.org/en/docs/http/ > ngx_http_limit_req_module.html again; sometimes details only "click" > after on a n-th read. Merci bien, d'un ancien matheux de l'U. de Clermont-Ferrand (maintenant Blaise-Pascal.) That document together with are the ones that I am hoping to put into effect. "In production" swapping between a fairly fast backend, and a slightly slower "backup", I'm being cautious. > > You mentioned 250k requests/day, but you did not characterise the > population spread. Not sure if I actually said that, but somewhat close. Real user requests (the site has been running for fifteen years or so) are probably around 150k, add longstanding "bots" (Duck, Google, Bing...) and the number sometimes doubles. These requests are mostly well-formed. Recently, analysis of nginx front-end logs, shows up to 1,250k requests per hour. Regrouping down to first 2 or 3 elements of each IP dotted quad has allowed me to deny a significant number of /10 and /11 networks (as a charity, we're not happy with the discrimination, but tech survival is relevant.) This is also "whack-a-mole" -- you asked for "population spread" (my comfort level in politics is low), but the spread is somewhat close to world population - led by China, Pakistan, Vietnam, Brazil and Microsoft. Conspicuously absent are Russia and Google. This is pure math in my microcosm. > My concern there would be if you 5 mebibytes storage is enough to handle > all the IP addresses you're trying to rate-limit: per documentation > (calculus details in there), one mebibyte stores either 16k IPv4 or 8k > IPv6. > Overflow is dealt with LRU. We're not seeing much IPv6 activity (and maybe I should just deny it?) and LRU shouldn't be a concern (it might balance out with the denies?) Can you suggest explicit code that I can try in a production environment. That would be truly appreciated. Tnx, merci, spaceeba, Paul \\\||// (@ @) ooO_(_)_Ooo__________________________________ |______|_____|_____|_____|_____|_____|_____|_____| |___|____|_____|_____|_____|_____|_____|_____|____| |_____|_____| mailto:paul at stormy.ca _|____|____| From bernard+freenginx at rosset.net Sat Sep 27 23:46:45 2025 From: bernard+freenginx at rosset.net (Bernard Rosset) Date: Sun, 28 Sep 2025 01:46:45 +0200 Subject: Using 444 In-Reply-To: References: <198ceebb-1112-4ed7-b46a-22c8060ad10a@rosset.net> Message-ID: >> You mentioned 250k requests/day, but you did not characterise the >> population spread. > > This is also "whack-a-mole" -- you asked for "population spread" (my > comfort level in politics is low), but the spread is somewhat close to > world population - led by China, Pakistan, Vietnam, Brazil and > Microsoft.? Conspicuously absent are Russia and Google.? This is pure > math in my microcosm. Maybe did I use the wrong word, but by "spread" I meant to talk about the diversity of IP addresses: are they always different or are the same coming back over and over? In the former case, you are most probably hitting the LRU eviction on your zone, hence virtually "resetting" their rate-limit, and allow for more requests to pass than you would wish. You could try and play with the zone memory size to see if that has an effect or not. If that is not enough, are there recognisable patterns, such like relatively narrow CIDR ranges they would belong to (/10 or /11 are way too big), which could be linked to recurring organisations? It's all guess-work after all. At the moment, you are directly working in $binary_remote_address. If you can regroup IP addresses in CIDR ranges, you could apply different rate limits. To do that, you could stack the geo directive feeding a map one, in turn feeding limit_req_zone in the end. Finally, on top of limiting requests, you could be also limiting connections of the worst offenders with limit_conn. The effectiveness of that added layer will essentially depend on whether those requests are reusing connections or not. -- Bernard Rosset https://rosset.net/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4736 bytes Desc: S/MIME Cryptographic Signature URL: From bctrainers at gmail.com Sun Sep 28 02:51:33 2025 From: bctrainers at gmail.com (Brett Cooper) Date: Sun, 28 Sep 2025 02:51:33 +0000 Subject: Using 444 In-Reply-To: References: Message-ID: Honestly, I wouldn't consider the 'AI vs resources' issue off-topic... granted I have a modest wall of text incoming. :-) While I have not used the `go-away` package from https://git.gammaspectra.live/git/go-away, I've also seen the effects of AI-harvesters on web servers. The resource consumption can sometimes be absolutely immense, be it software or physical capacity ceilings being met. That of which being due to AI-bots completely ignoring or evading rate limiting due to the usage of massive ranges of CIDRs available. With that said, the majority of "good" AI-harvesters/agents utilize a user agent, which makes blocking or rate limiting them at the nginx-level fairly straight forward. Otherwise, the more 'sneaky' AI harvesters, will generally be mimicking real or near-real-looking user agents. At that point, those AI-harvesters/agents are generally and predominately based on 'cloud' CIDRs, making it fairly easy to block/filter. Which in turn, gives us some options to use against AI agents/harvesters. 1) Using something like the 'nginx ultimate bad bot blocker' project located at: https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker and configuring it to be exceptionally strict (return 444 or rate limiting) against user agents deemed unwanted. 2) Using iptables/nftables on Linux or an appliance in front of the nginx server to block/drop swaths of CIDRs relating to problematic/toxic cloud networks/data centers. 3) Stout rate limiting via making use of the ngx_http_limit_req module. The best use-case is utilizing all three options. In the case of using that ultimate bad bot blocker, I've added the following to my blacklist-user-agents.conf file (this includes probing and AI clients): "~*(?:\b)libwww-perl(?:\b)" 3; "~*(?:\b)wget(?:\b)" 3; "~*(?:\b)Go\-http\-client(?:\b)" 3; "~*(?:\b)LieBaoFast(?:\b)" 3; "~*(?:\b)Mb2345Browser(?:\b)" 3; "~*(?:\b)MicroMessenger(?:\b)" 3; "~*(?:\b)zh_CN(?:\b)" 3; "~*(?:\b)Kinza(?:\b)" 3; "~*(?:\b)Bytespider(?:\b)" 3; #TikTok Scraper "~*(?:\b)Baiduspider(?:\b)" 3; "~*(?:\b)Sogou(?:\b)" 3; "~*(?:\b)Datanyze(?:\b)" 3; "~*(?:\b)AspiegelBot(?:\b)" 3; "~*(?:\b)adscanner(?:\b)" 3; "~*(?:\b)serpstatbot(?:\b)" 3; "~*(?:\b)spaziodat(?:\b)" 3; "~*(?:\b)undefined(?:\b)" 3; "~*(?:\b)claudebot(?:\b)" 3; "~*(?:\b)anthropic\-ai(?:\b)" 3; "~*(?:\b)ccbot(?:\b)" 3; "~*(?:\b)FacebookBot(?:\b)" 3; "~*(?:\b)OmigiliBot(?:\b)" 3; "~*(?:\b)cohere\-ai(?:\b)" 3; "~*(?:\b)Diffbot(?:\b)" 3; "~*(?:\b)omgili(?:\b)" 3; "~*(?:\b)GoogleOther(?:\b)" 3; "~*(?:\b)Google\-Extended(?:\b)" 3; "~*(?:\b)ChatGPT-User(?:\b)" 3; "~*(?:\b)GPTBot(?:\b)" 3; "~*(?:\b)Amazonbot(?:\b)" 3; "~*(?:\b)Applebot(?:\b)" 3; "~*(?:\b)PerplexityBot(?:\b)" 3; "~*(?:\b)YouBot(?:\b)" 3; I've probably left off a few on this list, but eh... This seems to have stopped the majority of AI scrapers/harvesters (and probers/exploiters) using such user agents. Thus, leaving out some misnomers to be blocked at the firewall level. As for me, the ones being blocked at the firewall level are primarily Chinese based cloud providers. The worst one that I've seen to date occurred earlier this year where four different cloud data centers were being used (abused?). Ultimately, someone or some company used these cloud provider services for mass scraping into AI harvesting/training. At one point, one of my servers was pushing thousands of requests a second from hundreds of different IP addresses, none of the IP addresses being unique and all using varying user agents and random page accesses that were previously scraped (A URL-list seems to have been collected initially). It wasn't until I blocked the majority of Alibaba Cloud (AS45102), Huawei Cloud (AS136907), TenCent Cloud (AS132203), and a small amount from OVH (AS16276) did things /mostly/ return to normalcy. I've never been a fan of the scorched-earth approach with blanket banning/dropping providers in swaths such as this. It is legitimately absurd that I've had to resort to doing such just to get some sanity back and resource usage under control. --Brett ------ Original Message ------ >From "Jeffrey Walton" To nginx at freenginx.org Date 09/27/2025 04:45:25 P Subject Re: Using 444 >On Sat, Sep 27, 2025 at 2:28?PM Paul wrote: >> >> [...] >> Maxim, many thanks. Currently battling a DDoS including out of control >> "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage >> rarely above 1%) but proxied apache2 often runs up to 98% across 12 >> cores (complex cgi needs 20-40 ms per response.) >> >> I'm attempting to mitigate. Your advice appreciated. I've "snipped" >> below for readability: > >My apologies if this wanders too off-topic. > >A lot of folks are having trouble due to AI Agents scraping their >sites for training data. It hit the folks at GNU particularly hard. >If AI is so smart, then why does it not clone a project instead of >scraping source code presented as web pages??? > >You might consider putting a box on the front-end to handle the abuse >from AI agents. Anibus, go-away and several others are popular. >go-away provides a list of similar projects at >. >In fact, go-away names Nginx's ngx_http_js_challenge_module as a >mitigation for the problem. > >Jeff From mdounin at mdounin.ru Mon Sep 29 08:17:48 2025 From: mdounin at mdounin.ru (Maxim Dounin) Date: Mon, 29 Sep 2025 11:17:48 +0300 Subject: Using 444 In-Reply-To: References: Message-ID: Hello! On Sat, Sep 27, 2025 at 02:28:11PM -0400, Paul wrote: > On 9/27/25 03:08, Maxim Dounin wrote: > > Hello! > > Maxim, many thanks. Currently battling a DDoS including out of control > "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage > rarely above 1%) but proxied apache2 often runs up to 98% across 12 cores > (complex cgi needs 20-40 ms per response.) > > I'm attempting to mitigate. Your advice appreciated. I've "snipped" below > for readability: > > [snip] > > > I am currently (a bit "hit and miss") using : > > > > > > proxy_buffering on; # maybe helps proxied apache2 ? > > > > Proxy buffering is on by default (see > > http://freenginx.org/r/proxy_buffering), so there is no need to > > switch it on unless you've switched it off at previous > > configuration levels. > > Understood, thanks -- I had two lines (rem'd in or out for testing purposes) > trying to respect genuine requests from regular users. Given that nginx has > a lot of spare capacity, could this be better tuned to alleviate the load on > the back end? I've read your doc, but in a production environment, I'm > unsure of the implications of "proxy_buffers number size;" and > "proxy_busy_buffers_size size;" In general, "proxy_buffering on" (the default) is to minimize usage of backend resources: it is designed to read the response from the backend as fast as possible into nginx buffers, so the backend connection can be released and/or closed even if the client is slow and sending the response to the client takes significant time. It is not that important nowadays, since clients are usually fast now, yet still can help in some cases. Unlikely in case of AI scrappers though. Other related settings, such as proxy_buffers, is to control what nginx does with buffers, and mostly needed to optimize processing on the nginx side. In particular, larger proxy_buffers might be needed if you want to keep more data in memory (vs. disk buffering). As long as responses are small enough to fit into existing memory buffers (4k proxy_buffer_size + 8 * 4k proxy_buffers == 36k by default), you probably don't need to tune anything. The proxy_busy_buffers_size directive controls how many memory buffers can be used to send the response to the client (vs. writing the response to the file-based buffer). It often needs to be explicitly configured to ensure it matches non-default proxy_buffers settings, but otherwise there isn't much need to tune it. > > > connection_pool_size 512; > > > client_header_buffer_size 512; > > > large_client_header_buffers 4 512; > > > > Similarly, I would rather use the default values unless you > > understand why you want to change these. > > Maybe mistakenly, I was trying to eliminate stupidly artificial cgi requests > -- "GET /cgi-bin/....." that ran several kilobytes long. The backend apache > could "swallow" them (normally a 404) but I was trying to eliminate the > overhead. If the goal is to stop requests with very long URIs, using an explicit regular expression to limit such URIs might be a better option. For example: if ($request_uri ~ ".{256}") { return 444; } The regular expression matches any request URI with more than 256 characters, and such requests are rejected . > > > location ~ \.php$ { return 444; } > > You did not mention this, but it does not appear to work well. access.log > today gives hundreds of: > > 104.46.211.169 - - [27/Sep/2025:12:32:12 +0000] "GET /zhidagen.php HTTP/1.1" > 404 5013 "-" "-" > > and the 5013 bytes is our "404-solr-try-again" page, not the 444 expected. This indicate there is something wrong with the configuration. Possible issues include: - Location being configured in the wrong/other server{} block. - Other locations with regular expressions interfere and take precedence. >From the details provided I suspect it's 404 from nginx, so might be simply a request from an unrelated server{} block handled by nginx? > > Also, depending on the traffic pattern you are seeing, it might be > > a good idea to configure limit_req / limit_conn with appropriate > > limits. > > Again thanks, I had tried various 'location' lines such as > limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s; > limit_req zone=mylimit burst=5 nodelay; > > without success... obviously haven't fully understood Depending on the traffic pattern, limiting per $binary_remote_addr might not be effective. In particular, AI scrappers I've observed tend to use lots of IP addresses, and limiting them based on sole IP address doesn't work well. For freenginx.org source code repositories I currently use something like this to limit abusive behaviour (yet still allow automated requests when needed, such as for non-abusive search engine indexing and repository cloning): map $binary_remote_addr $net24 { ~^(\C\C\C) $1; } map $binary_remote_addr $net16 { ~^(\C\C) $1; } map $binary_remote_addr $net8 { ~^(\C) $1; } limit_conn_zone $binary_remote_addr zone=conns:1m; limit_conn_zone $net24 zone=conns24:1m; limit_conn_zone $net16 zone=conns16:1m; limit_conn_zone $net8 zone=conns8:1m; Additionally, I use the following to limit most abusive AI scrappers with multiple netblocks, mostly filed with netblocks manually: geo $remote_addr $netname { # AS45102, Alibaba Cloud LLC 47.74.0.0/15 AS45102; 47.80.0.0/13 AS45102; 47.76.0.0/14 AS45102; # AS32934, Facebook, netblocks observed in logs 57.141.0.0/16 AS32934; 57.142.0.0/15 AS32934; 57.144.0.0/14 AS32934; 57.148.0.0/15 AS32934; # Huawei netblocks, from geofeed in whois records 1.178.32.0/23 HW; ... } limit_conn_zone $netname zone=connsname:1m; With the following limits in proxied locations: limit_conn conns 5; limit_conn conns24 10; limit_conn conns16 20; limit_conn conns8 30; limit_conn connsname 10; The backend is configured to serve 30 parallel requests and has listen queue 128 (Apache httpd with "MaxRequestWorkers 30"). With the above limits it currently works without issues, ensuring no errors and reasonable response time for all users. If the goal is to stop all automated scrapping, using some JS-based challenge as already recommended in this thread might be a better option. Hope this helps. -- Maxim Dounin http://mdounin.ru/