Possible issue with LRU and shared memory zones?

Eirik Øverby ltning-nginx at anduin.net
Sat Sep 21 17:33:10 UTC 2024


Hi!

TL;DR: Did almost what you suggested. Thank you!
Bit more details below..

On 21.09.2024 19:13, Maxim Dounin wrote:
> Hello!
> 
> On Sat, Sep 21, 2024 at 03:14:08PM +0200, Eirik Øverby via nginx wrote:
> 
>> Hi!
>>
>> We've used nginx on our FreeBSD systems for what feels like forever, and
>> love it. Over the last few years we've been hit by pretty massive DDoS
>> attacks, and have been employing various tricks in nginx to fend them off.
>> One of them is, of course, rate limiting.
>>
>> Given a config like..
>>    limit_req_zone $request zone=unique_request_5:100m rate=5r/s;
>>
>> and then
>>      limit_req zone=unique_request_5 burst=50 nodelay;
>>
>> we're getting messages like this:
>>    could not allocate node in limit_req zone "unique_request_5"
>>
>> We see this on an idle node that only get very sporadic requests. However,
>> this is preceded by a DDoS attack several hours earlier, which consisted of
>> requests hitting this exact location block with short requests like
>>    POST /foo/bar?token=DEADBEEF
>>
>> When, after a few million requests like this in a short timespan, a "normal"
>> request comes in - *much* longer than the DDoS request - , e.g.
>>    POST /foo/bar?token=DEADBEEF&moredata=foo&evenmoredata=bar
>>
>> this is immediately REJECTED by the rate limiter, and we get the
>> aforementioned error in the log.
>>
>> The current theory, supported by consulting with FreeBSD developers far more
>> educated and experienced than myself, is that something is going wrong with
>> the LRU allocator: Since nearly all of the shared memory zone was filled
>> with short requests, freeing up one (or even two) of them will not be
>> sufficient for these new requests. Only an nginx restart clears this up.
>>
>> Is there anything we can do to avoid this? I know the API for clearing and
>> monitoring the shared memory zones until now has only been available in
>> nginx plus - but we are strictly on a FOSS-only diet so using anything like
>> that is obviously out of the question.
> 
> I think your interpretation is (mostly) correct, and the issue
> here is that all shared memory zone pages are occupied for small
> slab allocations.  As such, slab allocator cannot fulfill the
> allocation request for a larger allocation.  And trying to free
> some limit_req nodes doesn't fix this, at least not immediately,
> since each page contains multiple nodes.
> 
> This is especially likely to be seen if $request is indeed very
> large (larger than 2k assuming 4k page size), and slab allocator
> cannot fulfill it from the existing slabs and falls back to
> allocating full pages.
> 
> Eventually this should fix itself - each requests frees up to 5
> limit_req nodes (usually just 2 expired nodes, but might clear
> more if first allocation attempt fails).  This might take a while
> though, since clearing even one page might require a lot of
> limit_req nodes freed: one page contains 64 of 64-byte nodes, but
> since nodes are cleared in LRU order, freeing 64 nodes might not
> be enough.
> 
> In the worst case this will require something like 63 * (number of
> pages) nodes freed.  For 100m shared zone this gives 1612800
> nodes, and hence about 800k requests.  This probably explains why
> this is seen as "only restart clears things up".
> 
> This probably can be somewhat improved by adjusting number of
> nodes limit_req normally clears - but this shouldn't be too many
> either, as this can open an additional DoS vector, and hence it
> cannot guarantee an allocation anyway.  Something like "up to 16
> normally, up to 128 in case of an allocation failure" might be a
> way to go though.
> 
> Another solution might be to improve configuration to ensure that
> all limit_req nodes require equal or close amount of memory - this
> is usually true with $binary_remote_addr being used for limit_req,
> but certainly not for $request.  Trivial fix that comes in mind is
> to use some hash, such as MD5, and limit the hash instead.  This
> will ensure fixed size of limit_req allocation, and will
> completely eliminate the problem.
> 
> With standard modules, this can be done with embedded Perl, such
> as:
> 
>      perl_set $request_md5 'sub {
>          use Digest::MD5 qw(md5);
>          my $r = shift;
>          return md5($r->variable("request"));
>      }';
> 
> (Note though that Perl might not be the best solution for DoS
> protection, as it implies noticeable overhead.)
> 
> With 3rd party modules, set_misc probably would be most
> appropriate, such as with "set_md5 $request_md5 $request;".

Just before getting your email, I added this:
   set_by_lua_block $request_md5 { return ngx.md5_bin(request) }
since we're already using LUA.
If you think set_md5 is faster, then I'll switch to that.


> Hope this helps.

It really did. Thank you very much!

/Eirik


More information about the nginx mailing list