Design Netflix / Video Streaming

Netflix streams to 240M subscribers in 190+ countries. Video streaming is uniquely challenging because content is large (GB–TB per title), bandwidth is expensive, and users expect instant playback with no buffering.

Requirements

Functional:

Upload and transcode videos
Stream videos to users on different devices and network conditions
Search and browse content catalog
Recommendation system
Resume playback across devices

Non-functional:

240M subscribers, 100M+ hours streamed daily
< 2 second startup time for videos
Support adaptive bitrate streaming (buffer-free on slow connections)
99.99% availability

Scale Estimation

Content library: ~15,000 titles
Each title stored at 5 resolutions × 3 codecs = 15 versions
Average movie file: ~5 GB → 15 versions = ~75 GB per title
Total storage: 15,000 × 75 GB = ~1.1 PB (just originals)

Streaming:
- 100M hours/day = ~1.15M concurrent streams
- Average bitrate: 8 Mbps
- Bandwidth: 1.15M × 8 Mbps = 9.2 Tbps peak

This is why Netflix owns ~15% of global internet bandwidth.

Architecture Overview

Netflix architecture has three major components:

1. Content Ingestion Pipeline

Content Creator uploads master file
    │
    ▼
[Upload Service] → S3 (raw storage)
    │
    ▼
[Transcoding Service]
    ├── Encode to H.264 (broad device support)
    ├── Encode to H.265/HEVC (50% better compression)
    ├── Encode to AV1 (best compression, newer devices)
    │
    ├── Resolution variants: 4K, 1080p, 720p, 480p, 360p
    │
    └── Split into 4-second chunks (for ABR streaming)
    │
    ▼
[CDN Upload] → Distribute chunks to 1,000+ CDN edge nodes

2. Adaptive Bitrate Streaming (ABR)

The secret to buffer-free streaming is serving different quality levels based on network conditions.

How ABR works:

1. Client downloads a manifest file (M3U8 or DASH):
   #EXTM3U
   #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
   /movie/360p/chunk_000.ts
   #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
   /movie/720p/chunk_000.ts
   #EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080
   /movie/1080p/chunk_000.ts

2. Client measures download speed of current chunk

3. If bandwidth > 8 Mbps → switch to 1080p for next chunk
   If bandwidth < 3 Mbps → switch to 720p
   If bandwidth < 800 Kbps → switch to 360p

4. Transitions are seamless — user doesn't notice quality changes

class ABRController {
  constructor(manifestUrl) {
    this.tracks = [];
    this.currentTrackIndex = 0;
    this.bufferLength = 0;
  }

  async loadManifest(url) {
    const manifest = await fetch(url).then(r => r.text());
    this.tracks = parseM3U8(manifest).sortBy('bandwidth');
    return this.tracks;
  }

  selectQuality(downloadSpeed, bufferLength) {
    // If buffer is low, be conservative — don't switch up
    if (bufferLength < 10) {
      // Drop quality to fill buffer faster
      return Math.max(0, this.currentTrackIndex - 1);
    }

    // Find highest quality that fits in available bandwidth
    // Use 80% of measured speed to account for variance
    const targetBandwidth = downloadSpeed * 0.8;
    let selectedIndex = 0;

    for (let i = 0; i < this.tracks.length; i++) {
      if (this.tracks[i].bandwidth <= targetBandwidth) {
        selectedIndex = i;
      }
    }

    return selectedIndex;
  }

  async downloadNextChunk() {
    const track = this.tracks[this.currentTrackIndex];
    const chunkUrl = track.nextChunkUrl();

    const start = performance.now();
    const chunk = await fetch(chunkUrl);
    const elapsed = (performance.now() - start) / 1000; // seconds
    const downloadSpeed = (track.chunkSizeBytes * 8) / elapsed; // bits/sec

    this.currentTrackIndex = this.selectQuality(downloadSpeed, this.bufferLength);
    return chunk;
  }
}

3. Content Delivery Network (CDN)

Netflix's CDN (Open Connect) is the key to its performance:

User in London requests "Stranger Things"
    │
    ▼
DNS resolves to nearest Open Connect Appliance (OCA)
    │
    ▼
OCA (inside ISP data centers, e.g., BT, Virgin Media)
    │
    ├── Cache hit? → Serve directly (< 1ms latency)
    └── Cache miss? → Pull from Netflix origin → cache → serve

Netflix actually ships OCA hardware to ISPs for free. By co-locating content inside ISPs, they avoid expensive backbone bandwidth and reduce latency dramatically.

Cache warming: Popular titles are pre-pushed to edge nodes overnight (off-peak hours):

async function warmCacheForNewRelease(titleId) {
  const popularRegions = await getRegionsByPopularity();
  const allEdgeNodes = await getEdgeNodesForRegions(popularRegions);

  // Pre-push the top 3 quality levels for the first 30 minutes of content
  const files = await generateWarmingList(titleId, {
    qualities: ['1080p', '720p', '480p'],
    durationSeconds: 1800,  // 30 minutes
  });

  await Promise.all(allEdgeNodes.map(node =>
    node.prefetch(files)
  ));
}

Video Encoding Pipeline

Netflix processes thousands of hours of content per day.

Upload (raw 4K ProRes master)
    │
    ▼
[Scene Analysis] — Detect dark scenes, action sequences, credits
    │
    ▼
[Per-Scene Encoding] — Allocate more bits to complex scenes
    │                   ("per-title encoding" + "per-shot encoding")
    ▼
[Quality Validation] — VMAF score check (perceptual quality metric)
    │
    ▼
[Manifest Generation] — Create HLS/DASH playlists
    │
    ▼
[CDN Distribution] — Push to edge nodes

Netflix's innovation — Per-Title/Per-Shot Encoding:

Instead of encoding at a fixed bitrate, Netflix analyzes each title and determines the optimal bitrate for each quality level. A simple animated show needs less bits than an action film with complex scenes.

Simple animation (Bojack Horseman):
  1080p: 2 Mbps (instead of standard 8 Mbps)

Complex live action (Stranger Things):
  1080p: 8 Mbps

Result: Better quality at lower bandwidth consumption

Data Model

-- Content catalog
CREATE TABLE titles (
  title_id     UUID PRIMARY KEY,
  title        VARCHAR(200),
  description  TEXT,
  release_year INT,
  genres       VARCHAR[] (array type),
  rating       VARCHAR(10),
  duration_ms  BIGINT,
  created_at   TIMESTAMP
);

-- Video assets
CREATE TABLE video_assets (
  asset_id    UUID PRIMARY KEY,
  title_id    UUID REFERENCES titles(title_id),
  codec       ENUM('h264', 'h265', 'av1'),
  resolution  VARCHAR(20),  -- '1080p', '720p', etc.
  bitrate     INT,
  manifest_url TEXT,
  cdn_prefix  TEXT,
  created_at  TIMESTAMP
);

-- Playback state (for resume feature)
CREATE TABLE playback_state (
  user_id     UUID,
  title_id    UUID,
  position_ms BIGINT,  -- Playback position in milliseconds
  device_id   UUID,
  updated_at  TIMESTAMP,
  PRIMARY KEY (user_id, title_id)
);

-- Watch history
CREATE TABLE watch_history (
  user_id     UUID,
  title_id    UUID,
  watched_at  TIMESTAMP,
  percent_watched FLOAT,  -- 0.0 to 1.0
  PRIMARY KEY (user_id, title_id, watched_at)
);

Recommendation System

Netflix's recommendation system drives 80% of content discovery.

Collaborative Filtering: "Users who watched X also watched Y"
Content-Based Filtering: "This is similar to titles you liked"
Matrix Factorization: Learn latent factors from watch history

Data pipeline:
Watch history → Feature extraction → Model training → Predictions → A/B test → Serve

// Simplified recommendation service
class RecommendationService {
  async getRecommendations(userId, limit = 20) {
    // Check cached recommendations first
    const cached = await redis.get(`recs:${userId}`);
    if (cached) return JSON.parse(cached);

    // Get user's watch history and ratings
    const watchHistory = await getWatchHistory(userId);
    const userPreferences = await getUserPreferences(userId);

    // Get pre-computed recommendations from ML model
    const recommendations = await mlService.predict({
      userId,
      watchHistory: watchHistory.slice(0, 100),  // Last 100 items
      preferences: userPreferences,
      context: { time: new Date().getHours(), device: 'tv' }
    });

    // Filter out already-watched content
    const watchedIds = new Set(watchHistory.map(w => w.titleId));
    const filtered = recommendations
      .filter(r => !watchedIds.has(r.titleId))
      .slice(0, limit);

    // Cache for 1 hour
    await redis.setex(`recs:${userId}`, 3600, JSON.stringify(filtered));

    return filtered;
  }
}

Interview Follow-up Questions

Q: How do you handle startup time?

Pre-buffer the first few seconds of the video before starting playback. Choose a lightweight codec and low bitrate for the initial segment, then switch up quality as the buffer fills.

Q: How do you implement DRM (Digital Rights Management)?

Use industry standards: Widevine (Google/Chrome), FairPlay (Apple), PlayReady (Microsoft). Encrypt content keys, deliver via license server with user authentication. Encrypted Media Extensions (EME) in browsers handle decryption.

Q: How do you detect and prevent account sharing?

Track login location, device fingerprints, and concurrent stream count. Flag accounts with > N concurrent streams from different locations. Challenge with re-authentication when suspicious patterns are detected.

Q: How would you scale the transcoding service?

Use a job queue (e.g., AWS SQS) to distribute transcoding jobs. Each job processes one "chunk" of video. Use spot instances for cost efficiency. Scale workers up during new content releases and down during off-peak hours.

Requirements​

Scale Estimation​

Architecture Overview​

1. Content Ingestion Pipeline​

2. Adaptive Bitrate Streaming (ABR)​

3. Content Delivery Network (CDN)​

Video Encoding Pipeline​

Data Model​

Recommendation System​

Interview Follow-up Questions​