Design Twitter's News Feed
Twitter's feed system is a canonical system design problem because it has an interesting architectural challenge: the fan-out problem. When someone with 50M followers tweets, how do you efficiently deliver that tweet to all followers?
Step 1: Clarify Requirementsโ
Functional requirements:
- Users can post tweets (text, images, videos)
- Users can follow other users
- Home timeline shows tweets from followed users in reverse-chronological order
- Tweets can have likes, replies, and retweets
Non-functional requirements:
- 300M DAU
- 500M tweets/day
- Read-heavy: 300B timeline reads/day vs. 500M writes
- P99 latency for home timeline: < 200ms
- System should be highly available
- Eventual consistency is acceptable (slight delays are OK)
Step 2: Estimate Scaleโ
Writes: 500M tweets/day = ~6,000 tweets/sec (peak: ~15,000/sec)
Reads: 300B timeline reads/day = ~3.5M reads/sec (peak: ~7M/sec)
Read/Write ratio: 500:1 (extremely read-heavy)
Storage:
- Tweet: user_id (8B) + text (280 bytes) + timestamp (8B) + media_url (256B) โ 600B
- 500M tweets/day ร 600B = 300 GB/day
- 10 years: ~1 PB (tweets are small but there are a lot of them)
Following relationships:
- 300M users ร avg 200 follows = 60B follow relationships
- 60B ร 16 bytes (follower_id + following_id) = ~1 TB
Step 3: Core Design Decisionsโ
The Fan-Out Problemโ
This is the central challenge of Twitter's architecture.
Scenario: User A has 50M followers. They tweet. How do 50M people see it?
You have two strategies:
Option A: Fan-Out on Write (Push Model)โ
When a tweet is posted, immediately write to each follower's timeline cache.
User A tweets
โ
โผ
Tweet Service
โ
โผ
Fan-Out Worker
โ
โโโโบ Write to follower 1's timeline cache
โโโโบ Write to follower 2's timeline cache
โโโโบ Write to follower 3's timeline cache
โ ...
โโโโบ Write to follower 50M's timeline cache
Pros:
- Home timeline reads are O(1) โ just read from cache
- Very fast reads (< 1ms)
Cons:
- Writing a celebrity's tweet requires 50M cache writes โ expensive and slow
- Storage: 300M users ร 100 cached tweets ร 600 bytes = 18 TB of cache
- Wasted storage for inactive users
Option B: Fan-Out on Read (Pull Model)โ
When a user opens their timeline, query tweets from all accounts they follow.
User B opens timeline
โ
โผ
Timeline Service
โ
โโโโบ Fetch tweets from user 1's tweet store
โโโโบ Fetch tweets from user 2's tweet store
โโโโบ Fetch tweets from user 3's tweet store
โ ... (for each of user B's 200 follows)
โโโโบ Merge, sort, return top 20
Pros:
- No fan-out cost on write
- No wasted storage for inactive users
Cons:
- Timeline generation requires N queries (N = number of follows) โ slow
- Hard to maintain low latency for users who follow 1000 accounts
Twitter's Actual Approach: Hybridโ
Twitter uses a hybrid strategy based on account type:
User posts tweet
โ
โผ
Is user a celebrity (>10k followers)?
โ
โโโ YES: Store tweet only in tweet store
โ (don't fan-out to all followers)
โ
โโโ NO: Fan-out to all followers' timeline caches
(write tweet_id to each follower's timeline)
User opens timeline
โ
โผ
Read timeline cache (pre-built for non-celebrity follows)
โ
โผ
Fetch any celebrity tweets from their tweet stores
โ
โผ
Merge + sort
โ
โผ
Return to user
This works because:
- Most people have < 10k followers โ fan-out is cheap
- Most people don't follow many celebrities โ the merge step is small
- Celebrities are the exception, not the rule
Step 4: Detailed Architectureโ
Data Modelโ
-- Users table
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
bio TEXT,
follower_count BIGINT,
created_at TIMESTAMP
);
-- Tweets table
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID (time-sortable)
user_id BIGINT NOT NULL,
content TEXT,
media_url TEXT,
like_count BIGINT DEFAULT 0,
created_at TIMESTAMP,
INDEX(user_id, created_at DESC)
);
-- Follows table
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
following_id BIGINT NOT NULL,
created_at TIMESTAMP,
PRIMARY KEY(follower_id, following_id),
INDEX(following_id, follower_id)
);
Why Snowflake IDs for tweets?
- Time-sortable: tweet IDs are monotonically increasing with time
- No need to sort by timestamp โ just sort by ID
- Distributed: can be generated without a central counter
- Compact: 64-bit integer
Timeline Cache (Redis)โ
Each user's home timeline is stored as a sorted set in Redis:
// Key: timeline:{user_id}
// Value: sorted set of tweet_ids, score = timestamp
// Fan-out: add tweet to follower's timeline
async function fanOut(tweetId, authorId, timestamp) {
const followers = await getFollowers(authorId);
// Process in batches of 100
for (const batch of chunk(followers, 100)) {
await Promise.all(batch.map(followerId =>
redis.zadd(
`timeline:${followerId}`,
timestamp,
tweetId,
{ NX: true } // Don't overwrite if already present
)
));
}
// Keep only the 800 most recent tweets per timeline
await redis.zremrangebyrank(`timeline:${authorId}`, 0, -801);
}
// Read: get home timeline
async function getHomeTimeline(userId, cursor, limit = 20) {
const tweetIds = await redis.zrevrange(
`timeline:${userId}`,
cursor,
cursor + limit - 1
);
// Fetch actual tweet data (could be another cache layer)
const tweets = await Promise.all(tweetIds.map(id => getTweet(id)));
return tweets.filter(Boolean);
}
Fan-Out Serviceโ
class FanOutService {
async processTweet(tweet) {
const { tweetId, userId, timestamp } = tweet;
const followerCount = await getUserFollowerCount(userId);
if (followerCount < 10000) {
// Regular user: fan-out immediately
await this.fanOutToFollowers(tweetId, userId, timestamp);
} else {
// Celebrity: just store in tweet store, timeline service will merge
await tweetStore.save(tweet);
// Still fan-out to a limited set (e.g., "super followers" or verified accounts)
}
}
async fanOutToFollowers(tweetId, userId, timestamp) {
let cursor = null;
// Paginate through followers
while (true) {
const { followers, nextCursor } = await getFollowersBatch(userId, cursor);
await Promise.all(followers.map(followerId =>
redis.zadd(`timeline:${followerId}`, timestamp, tweetId)
));
if (!nextCursor) break;
cursor = nextCursor;
}
}
}
Full System Architectureโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Load Balancer โ
โโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโ
โ โ
โโโโโโโโโโผโโโโโโโ โโโโโโโโโโผโโโโโโโ
โ Tweet API โ โ Timeline API โ
โ Service โ โ Service โ
โโโโโโโโโโฌโโโโโโโ โโโโโโโโโโฌโโโโโโโ
โ โ
โโโโโโโโโโโโโโโผโโโโโโโ โโโโโโโโโโโผโโโโโโโโ
โ Message Queue โ โ Timeline Cache โ
โ (Kafka) โ โ (Redis Cluster) โ
โโโโโโโโโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โฒ
โโโโโโโโโโผโโโโโโโ โ (merge)
โ Fan-Out โ โ
โ Workers โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโฌโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ Tweet Store โ
โ (Cassandra โ time-series) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ User/Follow Graph Store โ
โ (MySQL / Graph DB) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Interview Follow-up Questionsโ
Q: How do you handle users with 100M followers?
For mega-celebrities (100M+ followers), fan-out is simply too expensive. Their tweets are not pre-pushed to anyone's timeline. Instead, the timeline service identifies celebrities among the user's follows and fetches their recent tweets at read time, merging with the pre-built timeline.
Q: How do you handle the "thundering herd" problem when a celebrity tweets?
Use a cache with a short TTL for celebrity timeline fetches. Add jitter to prevent all users from invalidating their cache at the same moment. Process fan-outs through an async queue rather than synchronously.
Q: How would you implement the "trending topics" feature?
Maintain a real-time count of hashtags using a sliding window (e.g., 1-hour window). Use Redis sorted sets:
ZINCRBY trending 1 #hashtag. At read time,ZREVRANGE trending 0 9returns top 10. Refresh periodically and apply geographic filtering.
Q: How do you keep tweet counts (likes, retweets) accurate at scale?
Count updates are high volume โ don't write to the tweet record directly. Instead, write counts to a separate counter store (Redis) and sync to the primary DB asynchronously. For the final tally, use Flajolet-Martin sketches or HyperLogLog for approximate counts at massive scale.