Skip to main content

Design Twitter's News Feed

Twitter's feed system is a canonical system design problem because it has an interesting architectural challenge: the fan-out problem. When someone with 50M followers tweets, how do you efficiently deliver that tweet to all followers?

Step 1: Clarify Requirementsโ€‹

Functional requirements:

  • Users can post tweets (text, images, videos)
  • Users can follow other users
  • Home timeline shows tweets from followed users in reverse-chronological order
  • Tweets can have likes, replies, and retweets

Non-functional requirements:

  • 300M DAU
  • 500M tweets/day
  • Read-heavy: 300B timeline reads/day vs. 500M writes
  • P99 latency for home timeline: < 200ms
  • System should be highly available
  • Eventual consistency is acceptable (slight delays are OK)

Step 2: Estimate Scaleโ€‹

Writes: 500M tweets/day = ~6,000 tweets/sec (peak: ~15,000/sec)
Reads: 300B timeline reads/day = ~3.5M reads/sec (peak: ~7M/sec)

Read/Write ratio: 500:1 (extremely read-heavy)

Storage:
- Tweet: user_id (8B) + text (280 bytes) + timestamp (8B) + media_url (256B) โ‰ˆ 600B
- 500M tweets/day ร— 600B = 300 GB/day
- 10 years: ~1 PB (tweets are small but there are a lot of them)

Following relationships:
- 300M users ร— avg 200 follows = 60B follow relationships
- 60B ร— 16 bytes (follower_id + following_id) = ~1 TB

Step 3: Core Design Decisionsโ€‹

The Fan-Out Problemโ€‹

This is the central challenge of Twitter's architecture.

Scenario: User A has 50M followers. They tweet. How do 50M people see it?

You have two strategies:


Option A: Fan-Out on Write (Push Model)โ€‹

When a tweet is posted, immediately write to each follower's timeline cache.

User A tweets
โ”‚
โ–ผ
Tweet Service
โ”‚
โ–ผ
Fan-Out Worker
โ”‚
โ”œโ”€โ”€โ–บ Write to follower 1's timeline cache
โ”œโ”€โ”€โ–บ Write to follower 2's timeline cache
โ”œโ”€โ”€โ–บ Write to follower 3's timeline cache
โ”‚ ...
โ””โ”€โ”€โ–บ Write to follower 50M's timeline cache

Pros:

  • Home timeline reads are O(1) โ€” just read from cache
  • Very fast reads (< 1ms)

Cons:

  • Writing a celebrity's tweet requires 50M cache writes โ€” expensive and slow
  • Storage: 300M users ร— 100 cached tweets ร— 600 bytes = 18 TB of cache
  • Wasted storage for inactive users

Option B: Fan-Out on Read (Pull Model)โ€‹

When a user opens their timeline, query tweets from all accounts they follow.

User B opens timeline
โ”‚
โ–ผ
Timeline Service
โ”‚
โ”œโ”€โ”€โ–บ Fetch tweets from user 1's tweet store
โ”œโ”€โ”€โ–บ Fetch tweets from user 2's tweet store
โ”œโ”€โ”€โ–บ Fetch tweets from user 3's tweet store
โ”‚ ... (for each of user B's 200 follows)
โ””โ”€โ”€โ–บ Merge, sort, return top 20

Pros:

  • No fan-out cost on write
  • No wasted storage for inactive users

Cons:

  • Timeline generation requires N queries (N = number of follows) โ€” slow
  • Hard to maintain low latency for users who follow 1000 accounts

Twitter's Actual Approach: Hybridโ€‹

Twitter uses a hybrid strategy based on account type:

User posts tweet
โ”‚
โ–ผ
Is user a celebrity (>10k followers)?
โ”‚
โ”œโ”€โ”€ YES: Store tweet only in tweet store
โ”‚ (don't fan-out to all followers)
โ”‚
โ””โ”€โ”€ NO: Fan-out to all followers' timeline caches
(write tweet_id to each follower's timeline)

User opens timeline
โ”‚
โ–ผ
Read timeline cache (pre-built for non-celebrity follows)
โ”‚
โ–ผ
Fetch any celebrity tweets from their tweet stores
โ”‚
โ–ผ
Merge + sort
โ”‚
โ–ผ
Return to user

This works because:

  • Most people have < 10k followers โ†’ fan-out is cheap
  • Most people don't follow many celebrities โ†’ the merge step is small
  • Celebrities are the exception, not the rule

Step 4: Detailed Architectureโ€‹

Data Modelโ€‹

-- Users table
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
bio TEXT,
follower_count BIGINT,
created_at TIMESTAMP
);

-- Tweets table
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID (time-sortable)
user_id BIGINT NOT NULL,
content TEXT,
media_url TEXT,
like_count BIGINT DEFAULT 0,
created_at TIMESTAMP,
INDEX(user_id, created_at DESC)
);

-- Follows table
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
following_id BIGINT NOT NULL,
created_at TIMESTAMP,
PRIMARY KEY(follower_id, following_id),
INDEX(following_id, follower_id)
);

Why Snowflake IDs for tweets?

  • Time-sortable: tweet IDs are monotonically increasing with time
  • No need to sort by timestamp โ€” just sort by ID
  • Distributed: can be generated without a central counter
  • Compact: 64-bit integer

Timeline Cache (Redis)โ€‹

Each user's home timeline is stored as a sorted set in Redis:

// Key: timeline:{user_id}
// Value: sorted set of tweet_ids, score = timestamp

// Fan-out: add tweet to follower's timeline
async function fanOut(tweetId, authorId, timestamp) {
const followers = await getFollowers(authorId);

// Process in batches of 100
for (const batch of chunk(followers, 100)) {
await Promise.all(batch.map(followerId =>
redis.zadd(
`timeline:${followerId}`,
timestamp,
tweetId,
{ NX: true } // Don't overwrite if already present
)
));
}

// Keep only the 800 most recent tweets per timeline
await redis.zremrangebyrank(`timeline:${authorId}`, 0, -801);
}

// Read: get home timeline
async function getHomeTimeline(userId, cursor, limit = 20) {
const tweetIds = await redis.zrevrange(
`timeline:${userId}`,
cursor,
cursor + limit - 1
);

// Fetch actual tweet data (could be another cache layer)
const tweets = await Promise.all(tweetIds.map(id => getTweet(id)));
return tweets.filter(Boolean);
}

Fan-Out Serviceโ€‹

class FanOutService {
async processTweet(tweet) {
const { tweetId, userId, timestamp } = tweet;
const followerCount = await getUserFollowerCount(userId);

if (followerCount < 10000) {
// Regular user: fan-out immediately
await this.fanOutToFollowers(tweetId, userId, timestamp);
} else {
// Celebrity: just store in tweet store, timeline service will merge
await tweetStore.save(tweet);
// Still fan-out to a limited set (e.g., "super followers" or verified accounts)
}
}

async fanOutToFollowers(tweetId, userId, timestamp) {
let cursor = null;

// Paginate through followers
while (true) {
const { followers, nextCursor } = await getFollowersBatch(userId, cursor);

await Promise.all(followers.map(followerId =>
redis.zadd(`timeline:${followerId}`, timestamp, tweetId)
));

if (!nextCursor) break;
cursor = nextCursor;
}
}
}

Full System Architectureโ€‹

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Load Balancer โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Tweet API โ”‚ โ”‚ Timeline API โ”‚
โ”‚ Service โ”‚ โ”‚ Service โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Message Queue โ”‚ โ”‚ Timeline Cache โ”‚
โ”‚ (Kafka) โ”‚ โ”‚ (Redis Cluster) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ–ฒ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ (merge)
โ”‚ Fan-Out โ”‚ โ”‚
โ”‚ Workers โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Tweet Store โ”‚
โ”‚ (Cassandra โ€” time-series) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ User/Follow Graph Store โ”‚
โ”‚ (MySQL / Graph DB) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Interview Follow-up Questionsโ€‹

Q: How do you handle users with 100M followers?

For mega-celebrities (100M+ followers), fan-out is simply too expensive. Their tweets are not pre-pushed to anyone's timeline. Instead, the timeline service identifies celebrities among the user's follows and fetches their recent tweets at read time, merging with the pre-built timeline.

Q: How do you handle the "thundering herd" problem when a celebrity tweets?

Use a cache with a short TTL for celebrity timeline fetches. Add jitter to prevent all users from invalidating their cache at the same moment. Process fan-outs through an async queue rather than synchronously.

Q: How would you implement the "trending topics" feature?

Maintain a real-time count of hashtags using a sliding window (e.g., 1-hour window). Use Redis sorted sets: ZINCRBY trending 1 #hashtag. At read time, ZREVRANGE trending 0 9 returns top 10. Refresh periodically and apply geographic filtering.

Q: How do you keep tweet counts (likes, retweets) accurate at scale?

Count updates are high volume โ€” don't write to the tweet record directly. Instead, write counts to a separate counter store (Redis) and sync to the primary DB asynchronously. For the final tally, use Flajolet-Martin sketches or HyperLogLog for approximate counts at massive scale.