Design a Rate Limiter

A rate limiter controls how often a user or service can perform an action within a given time period. It's a critical component for protecting APIs from abuse, ensuring fair usage, and maintaining service stability.

Requirements

Functional:

Limit the number of requests per user/IP per time window
Support different rate limits for different endpoints
Return appropriate HTTP 429 (Too Many Requests) responses with retry-after headers

Non-functional:

Low latency: the rate limiter itself should add < 1ms to request processing
Highly available: a rate limiter outage shouldn't take down the API
Accurate: as close to the specified limit as possible
Distributed: works across multiple API servers

Rate Limiting Algorithms

1. Fixed Window Counter

Divide time into fixed windows (e.g., 1-minute buckets). Count requests in the current window.

Window 1 (0:00-1:00): ████████████ 12/100 requests
Window 2 (1:00-2:00): ██████ 6/100 requests

Implementation:

async function isAllowed(userId, windowSizeMs, maxRequests) {
  const windowStart = Math.floor(Date.now() / windowSizeMs) * windowSizeMs;
  const key = `rate:${userId}:${windowStart}`;

  const count = await redis.incr(key);
  if (count === 1) {
    await redis.pexpire(key, windowSizeMs * 2);  // Auto-cleanup
  }

  return count <= maxRequests;
}

Problem: Burst at window boundaries.

Limit: 100 req/min
User sends 100 requests at 0:59 (end of window 1) ✓
User sends 100 requests at 1:01 (start of window 2) ✓
In the span of 2 seconds, user sent 200 requests — 2× the limit!

2. Sliding Window Log

Track the exact timestamp of each request. Count requests in the last N milliseconds.

async function isAllowed(userId, windowMs, maxRequests) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const key = `rate:log:${userId}`;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);  // Remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`);  // Add current request
  pipeline.zcard(key);  // Count requests in window
  pipeline.expire(key, windowMs / 1000);

  const results = await pipeline.exec();
  const requestCount = results[2][1];

  return requestCount <= maxRequests;
}

Pros: Highly accurate, no boundary burst problem Cons: Memory intensive — stores every request timestamp. For 1M users × 100 req/min, that's 100M entries in Redis.

3. Sliding Window Counter (Recommended)

Hybrid approach: combine fixed window accuracy with sliding window smoothness.

Previous window (1:00-2:00): 70 requests
Current window (2:00-3:00):  35 requests (so far, at 2:30)

Elapsed in current window: 30s out of 60s = 50%

Estimated count = previous_window × (1 - elapsed_fraction) + current_window
                = 70 × (1 - 0.5) + 35
                = 35 + 35 = 70 ← this is the "rolling" count

async function isAllowed(userId, windowMs, maxRequests) {
  const now = Date.now();
  const windowStart = Math.floor(now / windowMs) * windowMs;
  const previousWindowStart = windowStart - windowMs;
  const elapsedFraction = (now - windowStart) / windowMs;

  const [prevCount, currCount] = await Promise.all([
    redis.get(`rate:${userId}:${previousWindowStart}`),
    redis.incr(`rate:${userId}:${windowStart}`)
  ]);

  // Set TTL on first request in window
  if (currCount === 1) {
    await redis.pexpire(`rate:${userId}:${windowStart}`, windowMs * 2);
  }

  const rollingCount = (prevCount || 0) * (1 - elapsedFraction) + currCount;
  return rollingCount <= maxRequests;
}

Best balance of accuracy and memory efficiency.

4. Token Bucket

A bucket holds N tokens. Each request consumes 1 token. Tokens refill at a fixed rate.

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

At 0s: 100 tokens (full)
User sends 50 requests: 50 tokens remaining
10 seconds pass: refilled to 100 tokens
User can now send another 100 requests

Great for allowing bursts up to the bucket capacity, while enforcing a steady long-term rate.

async function isAllowed(userId, capacity, refillRate) {
  const key = `token_bucket:${userId}`;
  const now = Date.now() / 1000;  // seconds

  const data = await redis.hgetall(key);
  let tokens = parseFloat(data?.tokens ?? capacity);
  const lastRefill = parseFloat(data?.lastRefill ?? now);

  // Calculate tokens to add based on time elapsed
  const tokensToAdd = (now - lastRefill) * refillRate;
  tokens = Math.min(capacity, tokens + tokensToAdd);

  if (tokens < 1) {
    // Not enough tokens
    const retryAfter = (1 - tokens) / refillRate;
    return { allowed: false, retryAfter };
  }

  tokens -= 1;
  await redis.hset(key, { tokens, lastRefill: now });
  await redis.expire(key, Math.ceil(capacity / refillRate) + 60);

  return { allowed: true };
}

Distributed Rate Limiting

A single Redis instance is a single point of failure. For production systems:

Option 1: Redis Cluster

Shard keys across multiple Redis nodes using consistent hashing.

const redis = new Redis.Cluster([
  { host: 'redis-1', port: 6379 },
  { host: 'redis-2', port: 6379 },
  { host: 'redis-3', port: 6379 }
]);
// Keys are automatically sharded across nodes

Option 2: Rate Limiting Middleware (Recommended Architecture)

Don't embed rate limiting in your API servers — use a dedicated middleware layer:

[Client]
    │
    ▼
[API Gateway / NGINX]
    │ → check rate limit
    ▼
[Rate Limiter Service]
    │ → Redis Cluster
    ▼
  [API Servers]

Benefits:

Rate limiter is independent, can be updated without touching API servers
Consistent limits across all API server instances
Can be bypassed internally (service-to-service calls)

Complete Rate Limiter Service

class RateLimiterService {
  constructor(redis) {
    this.redis = redis;
    this.rules = {
      '/api/login': { windowMs: 60000, maxRequests: 5 },
      '/api/messages': { windowMs: 1000, maxRequests: 10 },
      '/api/*': { windowMs: 60000, maxRequests: 100 }
    };
  }

  getRuleForEndpoint(endpoint) {
    // Match specific rule first, then wildcard
    const specificRule = this.rules[endpoint];
    if (specificRule) return specificRule;
    return this.rules['/api/*'];
  }

  async checkLimit(userId, endpoint) {
    const rule = this.getRuleForEndpoint(endpoint);
    const { windowMs, maxRequests } = rule;

    const result = await this.slidingWindowCounter(userId, endpoint, windowMs, maxRequests);

    if (!result.allowed) {
      const retryAfter = Math.ceil((windowMs - (Date.now() % windowMs)) / 1000);
      return {
        allowed: false,
        headers: {
          'X-RateLimit-Limit': maxRequests,
          'X-RateLimit-Remaining': 0,
          'X-RateLimit-Reset': Math.ceil((Date.now() + retryAfter * 1000) / 1000),
          'Retry-After': retryAfter
        }
      };
    }

    return {
      allowed: true,
      headers: {
        'X-RateLimit-Limit': maxRequests,
        'X-RateLimit-Remaining': result.remaining,
        'X-RateLimit-Reset': Math.ceil((Date.now() + windowMs) / 1000)
      }
    };
  }

  async slidingWindowCounter(userId, endpoint, windowMs, maxRequests) {
    // ... (implementation from above)
  }
}

// Express middleware
async function rateLimitMiddleware(req, res, next) {
  const userId = req.user?.id ?? req.ip;
  const result = await rateLimiter.checkLimit(userId, req.path);

  // Always set rate limit headers
  Object.entries(result.headers).forEach(([key, value]) => {
    res.setHeader(key, value);
  });

  if (!result.allowed) {
    return res.status(429).json({
      error: 'Too Many Requests',
      message: `Rate limit exceeded. Retry after ${result.headers['Retry-After']} seconds.`
    });
  }

  next();
}

Interview Follow-up Questions

Q: How do you handle rate limiting across multiple datacenters?

Option 1: Each datacenter has its own rate limit quota (1/N of total). Simple but uneven. Option 2: Global rate limiting via a cross-DC Redis (adds latency). Option 3: Token bucket with periodic cross-DC sync (approximate but low latency).

Q: How do you handle the case where Redis goes down?

Fail open (allow all requests) to prevent rate limiter outages from taking down the API. Log the outage. Add circuit breaker pattern — if Redis error rate exceeds threshold, temporarily bypass rate limiting.

Q: How do you rate limit by different dimensions (user, IP, API key)?

Create a composite key: rate:{dimension}:{value}:{endpoint}. Apply the most restrictive matching rule. Users with API keys get higher limits than anonymous IPs.

Q: How do you prevent users from gaming fixed windows?

Use the sliding window counter or token bucket algorithm. Both prevent the boundary burst problem inherent to fixed windows.

Requirements​

Rate Limiting Algorithms​

1. Fixed Window Counter​

2. Sliding Window Log​

3. Sliding Window Counter (Recommended)​

4. Token Bucket​

Distributed Rate Limiting​

Option 1: Redis Cluster​

Option 2: Rate Limiting Middleware (Recommended Architecture)​

Complete Rate Limiter Service​

Interview Follow-up Questions​