Rate Limiting Algorithms

When building popular web services, you need to protect them from being overwhelmed by too many requests. Rate limiting helps by controlling how many requests a client can make within a specific time period. Let's look at some common algorithms used for rate limiting.

The simplest approach is Fixed Window rate limiting. Imagine dividing time into fixed windows (like minutes) and allowing only N requests per window. When a new minute starts, the counter resets. While simple to implement, this can lead to burst traffic at window boundaries. If your limit is 100 requests per minute, a client could make 100 requests at 11:59 and another 100 at 12:00, sending 200 requests in just two seconds.

Sliding Window rate limiting solves this boundary problem. Instead of fixed windows, it uses a rolling time window. When a request arrives, we count all requests in the last minute (or whatever time period we choose). This prevents burst traffic at window boundaries but requires more memory since we need to track the timestamp of each request.

Token Bucket is another popular algorithm. Imagine a bucket that holds tokens. New tokens are added at a fixed rate up to the bucket's capacity. Each request needs one token. If there are no tokens, the request is rejected. This allows for bursts of traffic (up to the bucket size) while maintaining a long-term rate limit.

Leaky Bucket works similarly but focuses on smoothing out traffic. Requests enter a queue (the bucket) that processes them at a constant rate. If the bucket is full, new requests are rejected. This ensures steady outflow but can lead to delays since requests wait in the queue.