Using 444
Maxim Dounin
mdounin at mdounin.ru
Mon Sep 29 08:17:48 UTC 2025
Hello!
On Sat, Sep 27, 2025 at 02:28:11PM -0400, Paul wrote:
> On 9/27/25 03:08, Maxim Dounin wrote:
> > Hello!
>
> Maxim, many thanks. Currently battling a DDoS including out of control
> "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage
> rarely above 1%) but proxied apache2 often runs up to 98% across 12 cores
> (complex cgi needs 20-40 ms per response.)
>
> I'm attempting to mitigate. Your advice appreciated. I've "snipped" below
> for readability:
>
> [snip]
> > > I am currently (a bit "hit and miss") using :
> > >
> > > proxy_buffering on; # maybe helps proxied apache2 ?
> >
> > Proxy buffering is on by default (see
> > http://freenginx.org/r/proxy_buffering), so there is no need to
> > switch it on unless you've switched it off at previous
> > configuration levels.
>
> Understood, thanks -- I had two lines (rem'd in or out for testing purposes)
> trying to respect genuine requests from regular users. Given that nginx has
> a lot of spare capacity, could this be better tuned to alleviate the load on
> the back end? I've read your doc, but in a production environment, I'm
> unsure of the implications of "proxy_buffers number size;" and
> "proxy_busy_buffers_size size;"
In general, "proxy_buffering on" (the default) is to minimize
usage of backend resources: it is designed to read the response
from the backend as fast as possible into nginx buffers, so the
backend connection can be released and/or closed even if the
client is slow and sending the response to the client takes
significant time. It is not that important nowadays, since
clients are usually fast now, yet still can help in some cases.
Unlikely in case of AI scrappers though.
Other related settings, such as proxy_buffers, is to control what
nginx does with buffers, and mostly needed to optimize processing
on the nginx side. In particular, larger proxy_buffers might be
needed if you want to keep more data in memory (vs. disk
buffering). As long as responses are small enough to fit into
existing memory buffers (4k proxy_buffer_size + 8 * 4k
proxy_buffers == 36k by default), you probably don't need to tune
anything.
The proxy_busy_buffers_size directive controls how many memory
buffers can be used to send the response to the client (vs.
writing the response to the file-based buffer). It often needs to
be explicitly configured to ensure it matches non-default
proxy_buffers settings, but otherwise there isn't much need to
tune it.
> > > connection_pool_size 512;
> > > client_header_buffer_size 512;
> > > large_client_header_buffers 4 512;
> >
> > Similarly, I would rather use the default values unless you
> > understand why you want to change these.
>
> Maybe mistakenly, I was trying to eliminate stupidly artificial cgi requests
> -- "GET /cgi-bin/....." that ran several kilobytes long. The backend apache
> could "swallow" them (normally a 404) but I was trying to eliminate the
> overhead.
If the goal is to stop requests with very long URIs, using an
explicit regular expression to limit such URIs might be a better
option. For example:
if ($request_uri ~ ".{256}") { return 444; }
The regular expression matches any request URI with more than 256
characters, and such requests are rejected .
> > > location ~ \.php$ { return 444; }
>
> You did not mention this, but it does not appear to work well. access.log
> today gives hundreds of:
>
> 104.46.211.169 - - [27/Sep/2025:12:32:12 +0000] "GET /zhidagen.php HTTP/1.1"
> 404 5013 "-" "-"
>
> and the 5013 bytes is our "404-solr-try-again" page, not the 444 expected.
This indicate there is something wrong with the configuration.
Possible issues include:
- Location being configured in the wrong/other server{} block.
- Other locations with regular expressions interfere and take
precedence.
>From the details provided I suspect it's 404 from nginx, so might
be simply a request from an unrelated server{} block handled by
nginx?
> > Also, depending on the traffic pattern you are seeing, it might be
> > a good idea to configure limit_req / limit_conn with appropriate
> > limits.
>
> Again thanks, I had tried various 'location' lines such as
> limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s;
> limit_req zone=mylimit burst=5 nodelay;
>
> without success... obviously haven't fully understood
Depending on the traffic pattern, limiting per $binary_remote_addr
might not be effective. In particular, AI scrappers I've observed
tend to use lots of IP addresses, and limiting them based on sole
IP address doesn't work well.
For freenginx.org source code repositories I currently use
something like this to limit abusive behaviour (yet still allow
automated requests when needed, such as for non-abusive search
engine indexing and repository cloning):
map $binary_remote_addr $net24 { ~^(\C\C\C) $1; }
map $binary_remote_addr $net16 { ~^(\C\C) $1; }
map $binary_remote_addr $net8 { ~^(\C) $1; }
limit_conn_zone $binary_remote_addr zone=conns:1m;
limit_conn_zone $net24 zone=conns24:1m;
limit_conn_zone $net16 zone=conns16:1m;
limit_conn_zone $net8 zone=conns8:1m;
Additionally, I use the following to limit most abusive AI
scrappers with multiple netblocks, mostly filed with netblocks
manually:
geo $remote_addr $netname {
# AS45102, Alibaba Cloud LLC
47.74.0.0/15 AS45102;
47.80.0.0/13 AS45102;
47.76.0.0/14 AS45102;
# AS32934, Facebook, netblocks observed in logs
57.141.0.0/16 AS32934;
57.142.0.0/15 AS32934;
57.144.0.0/14 AS32934;
57.148.0.0/15 AS32934;
# Huawei netblocks, from geofeed in whois records
1.178.32.0/23 HW;
...
}
limit_conn_zone $netname zone=connsname:1m;
With the following limits in proxied locations:
limit_conn conns 5;
limit_conn conns24 10;
limit_conn conns16 20;
limit_conn conns8 30;
limit_conn connsname 10;
The backend is configured to serve 30 parallel requests and has
listen queue 128 (Apache httpd with "MaxRequestWorkers 30"). With
the above limits it currently works without issues, ensuring no
errors and reasonable response time for all users.
If the goal is to stop all automated scrapping, using some
JS-based challenge as already recommended in this thread might be
a better option.
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list