Possible issue with LRU and shared memory zones?

Sat Sep 21 17:13:46 UTC 2024

Hello!

On Sat, Sep 21, 2024 at 03:14:08PM +0200, Eirik Øverby via nginx wrote:

> Hi!
> 
> We've used nginx on our FreeBSD systems for what feels like forever, and
> love it. Over the last few years we've been hit by pretty massive DDoS
> attacks, and have been employing various tricks in nginx to fend them off.
> One of them is, of course, rate limiting.
> 
> Given a config like..
>   limit_req_zone $request zone=unique_request_5:100m rate=5r/s;
> 
> and then
>     limit_req zone=unique_request_5 burst=50 nodelay;
> 
> we're getting messages like this:
>   could not allocate node in limit_req zone "unique_request_5"
> 
> We see this on an idle node that only get very sporadic requests. However,
> this is preceded by a DDoS attack several hours earlier, which consisted of
> requests hitting this exact location block with short requests like
>   POST /foo/bar?token=DEADBEEF
> 
> When, after a few million requests like this in a short timespan, a "normal"
> request comes in - *much* longer than the DDoS request - , e.g.
>   POST /foo/bar?token=DEADBEEF&moredata=foo&evenmoredata=bar
> 
> this is immediately REJECTED by the rate limiter, and we get the
> aforementioned error in the log.
> 
> The current theory, supported by consulting with FreeBSD developers far more
> educated and experienced than myself, is that something is going wrong with
> the LRU allocator: Since nearly all of the shared memory zone was filled
> with short requests, freeing up one (or even two) of them will not be
> sufficient for these new requests. Only an nginx restart clears this up.
> 
> Is there anything we can do to avoid this? I know the API for clearing and
> monitoring the shared memory zones until now has only been available in
> nginx plus - but we are strictly on a FOSS-only diet so using anything like
> that is obviously out of the question.

I think your interpretation is (mostly) correct, and the issue 
here is that all shared memory zone pages are occupied for small 
slab allocations.  As such, slab allocator cannot fulfill the 
allocation request for a larger allocation.  And trying to free 
some limit_req nodes doesn't fix this, at least not immediately, 
since each page contains multiple nodes.

This is especially likely to be seen if $request is indeed very 
large (larger than 2k assuming 4k page size), and slab allocator 
cannot fulfill it from the existing slabs and falls back to 
allocating full pages.

Eventually this should fix itself - each requests frees up to 5 
limit_req nodes (usually just 2 expired nodes, but might clear 
more if first allocation attempt fails).  This might take a while 
though, since clearing even one page might require a lot of 
limit_req nodes freed: one page contains 64 of 64-byte nodes, but 
since nodes are cleared in LRU order, freeing 64 nodes might not 
be enough.

In the worst case this will require something like 63 * (number of 
pages) nodes freed.  For 100m shared zone this gives 1612800 
nodes, and hence about 800k requests.  This probably explains why 
this is seen as "only restart clears things up".

This probably can be somewhat improved by adjusting number of 
nodes limit_req normally clears - but this shouldn't be too many 
either, as this can open an additional DoS vector, and hence it 
cannot guarantee an allocation anyway.  Something like "up to 16 
normally, up to 128 in case of an allocation failure" might be a 
way to go though.

Another solution might be to improve configuration to ensure that 
all limit_req nodes require equal or close amount of memory - this 
is usually true with $binary_remote_addr being used for limit_req, 
but certainly not for $request.  Trivial fix that comes in mind is 
to use some hash, such as MD5, and limit the hash instead.  This 
will ensure fixed size of limit_req allocation, and will 
completely eliminate the problem.

With standard modules, this can be done with embedded Perl, such 
as:

    perl_set $request_md5 'sub {
        use Digest::MD5 qw(md5);
        my $r = shift;
        return md5($r->variable("request"));
    }';

(Note though that Perl might not be the best solution for DoS 
protection, as it implies noticeable overhead.)

With 3rd party modules, set_misc probably would be most 
appropriate, such as with "set_md5 $request_md5 $request;".

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/