Skip to main content

Design a Rate Limiter

A rate limiter controls how often a user or service can perform an action within a given time period. It's a critical component for protecting APIs from abuse, ensuring fair usage, and maintaining service stability.

Requirementsโ€‹

Functional:

  • Limit the number of requests per user/IP per time window
  • Support different rate limits for different endpoints
  • Return appropriate HTTP 429 (Too Many Requests) responses with retry-after headers

Non-functional:

  • Low latency: the rate limiter itself should add < 1ms to request processing
  • Highly available: a rate limiter outage shouldn't take down the API
  • Accurate: as close to the specified limit as possible
  • Distributed: works across multiple API servers

Rate Limiting Algorithmsโ€‹

1. Fixed Window Counterโ€‹

Divide time into fixed windows (e.g., 1-minute buckets). Count requests in the current window.

Window 1 (0:00-1:00): โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 12/100 requests
Window 2 (1:00-2:00): โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 6/100 requests

Implementation:

async function isAllowed(userId, windowSizeMs, maxRequests) {
const windowStart = Math.floor(Date.now() / windowSizeMs) * windowSizeMs;
const key = `rate:${userId}:${windowStart}`;

const count = await redis.incr(key);
if (count === 1) {
await redis.pexpire(key, windowSizeMs * 2); // Auto-cleanup
}

return count <= maxRequests;
}

Problem: Burst at window boundaries.

Limit: 100 req/min
User sends 100 requests at 0:59 (end of window 1) โœ“
User sends 100 requests at 1:01 (start of window 2) โœ“
In the span of 2 seconds, user sent 200 requests โ€” 2ร— the limit!

2. Sliding Window Logโ€‹

Track the exact timestamp of each request. Count requests in the last N milliseconds.

async function isAllowed(userId, windowMs, maxRequests) {
const now = Date.now();
const windowStart = now - windowMs;
const key = `rate:log:${userId}`;

const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
pipeline.zcard(key); // Count requests in window
pipeline.expire(key, windowMs / 1000);

const results = await pipeline.exec();
const requestCount = results[2][1];

return requestCount <= maxRequests;
}

Pros: Highly accurate, no boundary burst problem Cons: Memory intensive โ€” stores every request timestamp. For 1M users ร— 100 req/min, that's 100M entries in Redis.


Hybrid approach: combine fixed window accuracy with sliding window smoothness.

Previous window (1:00-2:00): 70 requests
Current window (2:00-3:00): 35 requests (so far, at 2:30)

Elapsed in current window: 30s out of 60s = 50%

Estimated count = previous_window ร— (1 - elapsed_fraction) + current_window
= 70 ร— (1 - 0.5) + 35
= 35 + 35 = 70 โ† this is the "rolling" count
async function isAllowed(userId, windowMs, maxRequests) {
const now = Date.now();
const windowStart = Math.floor(now / windowMs) * windowMs;
const previousWindowStart = windowStart - windowMs;
const elapsedFraction = (now - windowStart) / windowMs;

const [prevCount, currCount] = await Promise.all([
redis.get(`rate:${userId}:${previousWindowStart}`),
redis.incr(`rate:${userId}:${windowStart}`)
]);

// Set TTL on first request in window
if (currCount === 1) {
await redis.pexpire(`rate:${userId}:${windowStart}`, windowMs * 2);
}

const rollingCount = (prevCount || 0) * (1 - elapsedFraction) + currCount;
return rollingCount <= maxRequests;
}

Best balance of accuracy and memory efficiency.


4. Token Bucketโ€‹

A bucket holds N tokens. Each request consumes 1 token. Tokens refill at a fixed rate.

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

At 0s: 100 tokens (full)
User sends 50 requests: 50 tokens remaining
10 seconds pass: refilled to 100 tokens
User can now send another 100 requests

Great for allowing bursts up to the bucket capacity, while enforcing a steady long-term rate.

async function isAllowed(userId, capacity, refillRate) {
const key = `token_bucket:${userId}`;
const now = Date.now() / 1000; // seconds

const data = await redis.hgetall(key);
let tokens = parseFloat(data?.tokens ?? capacity);
const lastRefill = parseFloat(data?.lastRefill ?? now);

// Calculate tokens to add based on time elapsed
const tokensToAdd = (now - lastRefill) * refillRate;
tokens = Math.min(capacity, tokens + tokensToAdd);

if (tokens < 1) {
// Not enough tokens
const retryAfter = (1 - tokens) / refillRate;
return { allowed: false, retryAfter };
}

tokens -= 1;
await redis.hset(key, { tokens, lastRefill: now });
await redis.expire(key, Math.ceil(capacity / refillRate) + 60);

return { allowed: true };
}

Distributed Rate Limitingโ€‹

A single Redis instance is a single point of failure. For production systems:

Option 1: Redis Clusterโ€‹

Shard keys across multiple Redis nodes using consistent hashing.

const redis = new Redis.Cluster([
{ host: 'redis-1', port: 6379 },
{ host: 'redis-2', port: 6379 },
{ host: 'redis-3', port: 6379 }
]);
// Keys are automatically sharded across nodes

Don't embed rate limiting in your API servers โ€” use a dedicated middleware layer:

[Client]
โ”‚
โ–ผ
[API Gateway / NGINX]
โ”‚ โ†’ check rate limit
โ–ผ
[Rate Limiter Service]
โ”‚ โ†’ Redis Cluster
โ–ผ
[API Servers]

Benefits:

  • Rate limiter is independent, can be updated without touching API servers
  • Consistent limits across all API server instances
  • Can be bypassed internally (service-to-service calls)

Complete Rate Limiter Serviceโ€‹

class RateLimiterService {
constructor(redis) {
this.redis = redis;
this.rules = {
'/api/login': { windowMs: 60000, maxRequests: 5 },
'/api/messages': { windowMs: 1000, maxRequests: 10 },
'/api/*': { windowMs: 60000, maxRequests: 100 }
};
}

getRuleForEndpoint(endpoint) {
// Match specific rule first, then wildcard
const specificRule = this.rules[endpoint];
if (specificRule) return specificRule;
return this.rules['/api/*'];
}

async checkLimit(userId, endpoint) {
const rule = this.getRuleForEndpoint(endpoint);
const { windowMs, maxRequests } = rule;

const result = await this.slidingWindowCounter(userId, endpoint, windowMs, maxRequests);

if (!result.allowed) {
const retryAfter = Math.ceil((windowMs - (Date.now() % windowMs)) / 1000);
return {
allowed: false,
headers: {
'X-RateLimit-Limit': maxRequests,
'X-RateLimit-Remaining': 0,
'X-RateLimit-Reset': Math.ceil((Date.now() + retryAfter * 1000) / 1000),
'Retry-After': retryAfter
}
};
}

return {
allowed: true,
headers: {
'X-RateLimit-Limit': maxRequests,
'X-RateLimit-Remaining': result.remaining,
'X-RateLimit-Reset': Math.ceil((Date.now() + windowMs) / 1000)
}
};
}

async slidingWindowCounter(userId, endpoint, windowMs, maxRequests) {
// ... (implementation from above)
}
}

// Express middleware
async function rateLimitMiddleware(req, res, next) {
const userId = req.user?.id ?? req.ip;
const result = await rateLimiter.checkLimit(userId, req.path);

// Always set rate limit headers
Object.entries(result.headers).forEach(([key, value]) => {
res.setHeader(key, value);
});

if (!result.allowed) {
return res.status(429).json({
error: 'Too Many Requests',
message: `Rate limit exceeded. Retry after ${result.headers['Retry-After']} seconds.`
});
}

next();
}

Interview Follow-up Questionsโ€‹

Q: How do you handle rate limiting across multiple datacenters?

Option 1: Each datacenter has its own rate limit quota (1/N of total). Simple but uneven. Option 2: Global rate limiting via a cross-DC Redis (adds latency). Option 3: Token bucket with periodic cross-DC sync (approximate but low latency).

Q: How do you handle the case where Redis goes down?

Fail open (allow all requests) to prevent rate limiter outages from taking down the API. Log the outage. Add circuit breaker pattern โ€” if Redis error rate exceeds threshold, temporarily bypass rate limiting.

Q: How do you rate limit by different dimensions (user, IP, API key)?

Create a composite key: rate:{dimension}:{value}:{endpoint}. Apply the most restrictive matching rule. Users with API keys get higher limits than anonymous IPs.

Q: How do you prevent users from gaming fixed windows?

Use the sliding window counter or token bucket algorithm. Both prevent the boundary burst problem inherent to fixed windows.