Design a URL Shortener (TinyURL)
This is one of the most common system design interview questions. It seems simple but covers a surprisingly wide set of topics: hashing, databases, caching, and scalability.
Step 1: Clarify Requirementsโ
Functional requirements:
- Given a long URL, generate a short URL (e.g.,
tiny.url/abc123) - Redirect users from short URL to original long URL
- Optional: Custom aliases, link expiration, analytics
Non-functional requirements:
- 100M short URLs created per day
- 10B redirects per day (100:1 read/write ratio โ extremely read-heavy)
- Short URLs should be unique and hard to guess
- Redirect should be fast (< 10ms P99)
- System should be highly available (99.99% uptime)
Step 2: Estimate Scaleโ
Writes: 100M URLs/day = ~1,200 writes/sec
Reads: 10B redirects/day = ~115,000 reads/sec
URL size:
- Long URL: ~2 KB average
- Short URL key: 7 characters = 7 bytes
- Metadata (user, timestamp, expiry): ~500 bytes
- Total per URL: ~2.5 KB
Storage for 10 years:
- 100M URLs/day ร 365 ร 10 ร 2.5 KB โ 900 TB
(With replication: ~3 PB โ need distributed storage)
Bandwidth:
- Write: 1,200 ร 2.5 KB = 3 MB/sec
- Read: 115,000 ร 500 bytes (just the redirect) = 57 MB/sec
Step 3: High-Level Designโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Load Balancer โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโ โโโโโโโโโโโ
โ โ โ
โโโโโโโผโโโโโโ โโโโโโโผโโโโโโ โโโโโโโผโโโโโโ
โ URL โ โ URL โ โ URL โ
โ Service โ โ Service โ โ Service โ
โโโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ Redis Cache โ
โ (hot URLs) โ
โโโโโโโโโโฌโโโโโโโโโโ
โ cache miss
โโโโโโโโโโผโโโโโโโโโ
โ Database โ
โ (URL store) โ
โโโโโโโโโโโโโโโโโโโ
API Designโ
POST /api/v1/shorten
Body: { longUrl: "https://...", customAlias?: "mylink", expiresAt?: "2025-12-31" }
Response: { shortUrl: "https://tiny.url/abc1234" }
GET /:shortCode
Response: 301/302 redirect to longUrl
DELETE /api/v1/:shortCode
Response: 200 OK
GET /api/v1/:shortCode/stats
Response: { clicks: 1234, createdAt: "...", ... }
301 vs 302 redirect:
301 Permanentโ Browser caches the redirect; reduces server load but you can't update the destination302 Temporaryโ Browser always asks the server; allows analytics tracking and URL updates- For a URL shortener, 302 is usually preferred (lets you track clicks)
Step 4: Key Design Decisionsโ
Generating Short Codesโ
You need a 7-character code from [a-zA-Z0-9] = 62 characters.
62^7 = ~3.5 trillion possible URLs โ more than enough.
Option A: Hash + Encode (Recommended)
const crypto = require('crypto');
function generateShortCode(longUrl) {
// SHA-256 hash of the URL
const hash = crypto.createHash('sha256').update(longUrl).digest('hex');
// Take first 7 characters of base62-encoded hash
return base62Encode(hash).substring(0, 7);
}
function base62Encode(hex) {
const chars = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
let num = BigInt('0x' + hex);
let result = '';
while (num > 0) {
result = chars[Number(num % 62n)] + result;
num = num / 62n;
}
return result;
}
Problem: Hash collisions. Two different long URLs could produce the same 7-character code.
Solution: Check if the code exists in the DB. If it does, append a counter and rehash.
Option B: Counter + Encode
Maintain a global counter. Each URL gets the next counter value, encoded in base62.
Counter: 1000000
Base62: 4c92
Short code: 4c92
Problem: Single point of failure for the counter. Predictable codes (security risk).
Solution: Use distributed unique ID generation (Snowflake IDs).
Option C: Pre-generated Key Pool
Pre-generate millions of random 7-character codes offline, store them in a "Key DB." Service picks one when needed.
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Key โ fetch โ โ
โ Generator โ โโโโโโโบ โ Key DB โ
โ (offline) โ โ (unused) โ
โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ
โ assign
โโโโโโโโผโโโโโโโโ
โ URL DB โ
โ (used) โ
โโโโโโโโโโโโโโโโ
This is the cleanest approach for high-scale systems.
Database Designโ
-- URL table
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);
Database choice:
- Use a NoSQL key-value store (like Cassandra or DynamoDB) for the main URL table โ the access pattern is a simple key lookup
- The key is
short_code, value islong_url + metadata - High write volume (1,200/sec) is handled well by Cassandra's write-optimized design
Caching Strategyโ
Since 80% of traffic goes to 20% of URLs (power law distribution), caching is extremely effective.
Cache design:
- Use Redis with LRU eviction policy
- Cache key:
short_code, value:long_url - Cache hit rate target: 95%+
- TTL: 24 hours (or match URL expiration)
Lookup flow:
1. Check Redis cache
โ Hit: Return long_url, 302 redirect (< 1ms)
โ Miss: Query database, cache result, redirect (< 10ms)
With 95% cache hit rate:
- 115,000 reads/sec ร 0.95 = 109,250 served from cache
- 115,000 ร 0.05 = 5,750 DB queries/sec (easily handled)
Handling Expirationโ
// When creating a URL
async function createShortUrl(longUrl, expiresAt) {
const shortCode = generateShortCode(longUrl);
await db.insert({
short_code: shortCode,
long_url: longUrl,
expires_at: expiresAt
});
if (expiresAt) {
// Set Redis TTL to match expiration
const ttlSeconds = (expiresAt - Date.now()) / 1000;
await redis.setex(shortCode, ttlSeconds, longUrl);
} else {
await redis.set(shortCode, longUrl);
}
return shortCode;
}
// When resolving a URL
async function resolveUrl(shortCode) {
// Check cache first
let longUrl = await redis.get(shortCode);
if (!longUrl) {
const record = await db.findOne({ short_code: shortCode });
if (!record || (record.expires_at && record.expires_at < new Date())) {
throw new Error('URL not found or expired');
}
longUrl = record.long_url;
await redis.set(shortCode, longUrl, { EX: 3600 });
}
// Track click asynchronously (don't block the redirect)
trackClick(shortCode).catch(console.error);
return longUrl;
}
Analytics (Bonus Deep Dive)โ
Track click analytics without impacting redirect latency:
[Redirect Service]
โ async
โผ
[Kafka Topic: url_clicks]
โ
โผ
[Analytics Consumer]
โ
โผ
[ClickHouse / Data Warehouse]
Each click event:
{
"short_code": "abc1234",
"timestamp": "2024-01-15T10:30:00Z",
"ip_address": "1.2.3.4",
"user_agent": "Mozilla/5.0...",
"referer": "https://google.com"
}
Final Architectureโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DNS / CDN โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ Load Balancer โ
โ (HAProxy / NGINX) โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโโผโโโโโโโโโ
โ URL Write โ โ URL Read โ โ URL Read โ
โ Service โ โ Service โ โ Service โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ
โ โ โ
โ โโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Redis Cluster (Cache) โโ
โ โโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ miss โ
โโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโ
โ Cassandra Cluster โ
โ (URL Key-Value Store) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ Kafka โ
โ (click events) โ
โโโโโโโโโโฌโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ ClickHouse โ
โ (analytics) โ
โโโโโโโโโโโโโโโโโโโ
Interview Follow-up Questionsโ
Q: How would you handle 10ร traffic growth?
Add more URL service instances behind the load balancer, expand the Redis cluster, and add Cassandra nodes. The system is horizontally scalable at every layer.
Q: How do you prevent abuse (spam/phishing)?
Add URL validation against a blocklist, rate limit by IP/user, and integrate with safe browsing APIs (Google Safe Browsing).
Q: How would you add custom domains?
Allow users to CNAME their domain to your service, store the domain in the URL record, and handle routing in the DNS layer.
Q: How do you handle the global distribution?
Deploy the read service in multiple regions (US, EU, Asia), replicate Cassandra across regions, and use anycast DNS to route users to the nearest datacenter.