Design a Chat System (WhatsApp)

WhatsApp handles 100 billion messages per day. Building a real-time chat system is a rich system design problem that touches on WebSockets, message queues, delivery guarantees, and storage at scale.

Step 1: Requirements

Functional:

One-on-one messaging
Group messaging (up to 500 members)
Message delivery status (sent ✓, delivered ✓✓, read ✓✓ in blue)
Online/offline indicators
Message history

Non-functional:

2B users, 500M DAU
100B messages/day
Messages must be delivered exactly once, in order
Low latency (< 100ms delivery between users in the same region)
High availability (messages should queue if recipient is offline)

Step 2: Scale Estimation

Messages: 100B/day = ~1.2M messages/sec

Message size:
- Text: ~300 bytes average
- With metadata (sender, recipient, timestamp, message_id): ~500 bytes

Storage:
- 100B × 500 bytes/day = 50 TB/day
- 5 years: ~90 PB (need massive distributed storage)
- Images: ~300 KB average, assume 10% of messages have images
  - 10B × 300 KB = 3 PB/day (need separate media storage)

Connections:
- 500M active users
- Each user maintains 1 WebSocket connection
- 500M concurrent WebSocket connections across the system
- Assume each chat server handles 100k connections
- Need: 5,000 chat servers

Step 3: Core Architecture

Connection Management

Traditional HTTP (polling) won't work — too much latency and server load. You need persistent connections.

Options:

Short polling: Client asks "any new messages?" every N seconds → too much overhead
Long polling: Client asks, server holds until message arrives → better but still inefficient
WebSocket (chosen): Bidirectional persistent connection → ideal for real-time messaging

Client A ──── WebSocket ──── Chat Server 1
Client B ──── WebSocket ──── Chat Server 2

Problem: A and B are connected to different servers. How does a message from A reach B?

Solution: Message queue + service discovery

Client A → Chat Server 1 → Kafka → Chat Server 2 → Client B

Message Flow (One-on-One)

1. User A sends message to User B
   A sends: { to: "userB", content: "Hello!", client_msg_id: "abc123" }

2. Chat Server 1 (A's server):
   a. Assigns a server-side message_id (Snowflake ID)
   b. Persists message to database with status: "sent"
   c. Acknowledges to A: { status: "sent", server_msg_id: "..." }
   d. Publishes to Kafka topic: messages/{userB_server_id}

3. Chat Server 2 (B's server):
   a. Consumes message from Kafka
   b. Updates status: "delivered"
   c. Pushes message to B via WebSocket
   d. B's client acknowledges receipt

4. If B is offline:
   - Message stays in Kafka / message store
   - Push notification sent via APNs/FCM
   - When B comes online, Chat Server delivers queued messages

Data Model

-- Messages table (Cassandra — high write throughput)
CREATE TABLE messages (
  conversation_id  UUID,
  message_id       BIGINT,  -- Snowflake ID (time-sortable)
  sender_id        UUID,
  content          TEXT,
  media_url        TEXT,
  message_type     ENUM('text', 'image', 'video', 'audio'),
  status           ENUM('sent', 'delivered', 'read'),
  created_at       TIMESTAMP,
  PRIMARY KEY ((conversation_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- Conversations table
CREATE TABLE conversations (
  conversation_id   UUID PRIMARY KEY,
  type              ENUM('direct', 'group'),
  name              TEXT,  -- for groups
  created_by        UUID,
  created_at        TIMESTAMP,
  last_message_at   TIMESTAMP
);

-- Conversation members
CREATE TABLE conversation_members (
  conversation_id  UUID,
  user_id          UUID,
  joined_at        TIMESTAMP,
  last_read_at     TIMESTAMP,
  PRIMARY KEY ((conversation_id), user_id)
);

-- User presence
CREATE TABLE user_presence (
  user_id         UUID PRIMARY KEY,
  is_online       BOOLEAN,
  last_seen_at    TIMESTAMP,
  server_id       VARCHAR(50)  -- which chat server they're connected to
);

Why Cassandra?

Write-optimized: handles 1.2M writes/sec efficiently
Partition by conversation_id: all messages for a conversation are co-located
Sort by message_id: natural chronological ordering
Horizontal scaling: add nodes as storage grows

Chat Server Implementation

class ChatServer {
  constructor(serverId) {
    this.serverId = serverId;
    this.connections = new Map(); // userId → WebSocket
    this.kafka = new KafkaConsumer(`messages.${serverId}`);
    this.kafka.on('message', this.deliverMessage.bind(this));
  }

  async handleConnection(ws, userId) {
    this.connections.set(userId, ws);

    // Mark user online and update server assignment
    await this.presenceService.setOnline(userId, this.serverId);

    // Deliver any queued messages
    await this.deliverQueuedMessages(userId);

    ws.on('message', (data) => this.handleMessage(ws, userId, data));
    ws.on('close', () => this.handleDisconnect(userId));
  }

  async handleMessage(ws, senderId, rawData) {
    const { conversationId, content, clientMsgId } = JSON.parse(rawData);

    // Generate server-side message ID (Snowflake)
    const messageId = generateSnowflakeId();

    // Persist to Cassandra
    await this.db.saveMessage({
      messageId, conversationId, senderId, content, status: 'sent'
    });

    // Acknowledge to sender
    ws.send(JSON.stringify({ type: 'ack', clientMsgId, messageId }));

    // Deliver to recipients
    const recipients = await this.getConversationMembers(conversationId);
    for (const recipientId of recipients) {
      if (recipientId === senderId) continue;
      await this.sendToUser(recipientId, { messageId, senderId, content });
    }
  }

  async sendToUser(userId, message) {
    // Check if user is on this server
    const localWs = this.connections.get(userId);
    if (localWs) {
      localWs.send(JSON.stringify(message));
      return;
    }

    // Find which server the user is on
    const serverInfo = await this.presenceService.getUserServer(userId);

    if (serverInfo?.isOnline) {
      // Route via Kafka to the correct server
      await this.kafka.publish(`messages.${serverInfo.serverId}`, message);
    } else {
      // User is offline — send push notification
      await this.pushNotificationService.send(userId, message);
    }
  }

  async deliverMessage(message) {
    // Message arrived from Kafka (routed from another server)
    const ws = this.connections.get(message.recipientId);
    if (ws) {
      ws.send(JSON.stringify(message));
    }
  }

  async handleDisconnect(userId) {
    this.connections.delete(userId);
    await this.presenceService.setOffline(userId);
  }
}

Message Delivery Guarantees

Exactly-once delivery is hard in distributed systems. WhatsApp achieves it with:

Client-assigned message ID (idempotency key): Prevent duplicates if client retries
Server-assigned Snowflake ID: Canonical message ordering
Delivery acknowledgments: Client ACKs when it receives the message

Message states:
PENDING → SENT → DELIVERED → READ

- PENDING: Stored on client, not yet sent
- SENT (✓): Server received and persisted
- DELIVERED (✓✓): Recipient's device received
- READ (✓✓ blue): Recipient opened the conversation

Group Messaging

For groups with 500 members, fan-out becomes expensive.

Options:

Write to each member's inbox: Simple but expensive for large groups
Shared group mailbox: All members read from a shared stream

WhatsApp's approach (hybrid):

Small groups (< 100): Fan-out to each member's inbox
Large groups: Shared group mailbox + members poll when they open the chat

async function sendGroupMessage(senderId, groupId, message) {
  const members = await getGroupMembers(groupId);

  if (members.length < 100) {
    // Fan-out approach
    await Promise.all(members.map(memberId =>
      sendToUser(memberId, { ...message, groupId })
    ));
  } else {
    // Shared mailbox approach
    await groupMailbox.append(groupId, message);
    // Send push notification to all members
    await pushNotify(members, { groupId, preview: message.content.substring(0, 50) });
  }
}

Full Architecture

                    ┌────────────────────────────────────────┐
                    │          Load Balancer (L4/TCP)         │
                    └──────┬──────────────────────────┬───────┘
                           │                          │
               ┌───────────▼──────┐      ┌───────────▼──────┐
               │  Chat Server 1   │      │  Chat Server 2   │
               │  (WebSocket)     │      │  (WebSocket)     │
               └───────────┬──────┘      └───────────┬──────┘
                           │                          │
               ┌───────────▼──────────────────────────▼──────┐
               │              Apache Kafka                    │
               │        (Message Routing Bus)                 │
               └───────────────────────┬──────────────────────┘
                                       │
              ┌────────────────────────┼────────────────────────┐
              │                        │                        │
    ┌─────────▼────────┐    ┌──────────▼──────┐    ┌───────────▼────────┐
    │  Message Store   │    │  Presence Store  │    │  Notification      │
    │  (Cassandra)     │    │  (Redis)         │    │  Service           │
    └──────────────────┘    └─────────────────┘    │  (APNs / FCM)      │
                                                   └────────────────────┘
              │
    ┌─────────▼────────┐
    │  Media Store     │
    │  (S3 / CDN)      │
    └──────────────────┘

Interview Follow-up Questions

Q: How do you handle message ordering with multiple devices?

Use Snowflake IDs (time-ordered) as message IDs. All devices sync by polling "give me messages with ID > last_seen_id". The monotonically increasing IDs ensure consistent ordering across devices.

Q: How do you handle end-to-end encryption?

Use the Signal Protocol. Keys are generated on-device. The server stores only encrypted blobs. Servers cannot read message content. Key exchange happens during the initial connection setup.

Q: How would you scale to 5B users?

Shard the chat servers by geography (US, EU, Asia). Route users to the nearest datacenter. Use a global message routing layer (based on Anycast DNS + regional Kafka clusters) to deliver cross-region messages.

Q: How do you recover from a chat server crash?

Chat servers are stateless — connection info is stored in Redis (presence + server assignment). On crash, clients reconnect to any available server. The new server reads the user's server assignment and picks up where the previous server left off.

Step 1: Requirements​

Step 2: Scale Estimation​

Step 3: Core Architecture​

Connection Management​

Message Flow (One-on-One)​

Data Model​

Chat Server Implementation​

Message Delivery Guarantees​

Group Messaging​

Full Architecture​

Interview Follow-up Questions​