All Posts

System Design HLD Example: Video Streaming (YouTube/Netflix)

A practical interview-ready HLD for a video streaming platform with adaptive bitrate and CDN delivery.

Abstract AlgorithmsAbstract Algorithms
ยทยท27 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: A video streaming platform is two separate systems stitched together โ€” a batch-oriented transcoding pipeline that converts raw uploads into multi-resolution HLS/DASH segments, and a real-time delivery network of CDN edge nodes that serve those segments with sub-100 ms latency worldwide. Nail the boundary between them and the rest of the design falls into place.


๐Ÿ“– The Scale Problem: 500 Hours of Video Per Minute

Every minute, users upload 500 hours of video to YouTube. Netflix streams to 238 million subscribers simultaneously. The hard part is not storing the bytes โ€” object storage handles that. The real challenge is everything in between: accepting a raw 4K MOV file, re-encoding it into seven resolutions and two streaming formats, distributing those segments to 200+ CDN edge locations worldwide, and then serving the right resolution to a viewer on a slow 3G phone without a single rebuffer.

Three separate hard problems are hiding inside that one sentence:

  • Ingestion at write time: how do you accept gigabyte-scale uploads reliably from any device or connection quality?
  • Processing at encode time: transcoding 4K video is CPU-intensive and takes minutes โ€” how do you scale it without blocking uploads?
  • Delivery at read time: how do you serve millions of concurrent streams from a file that lives in one origin bucket?

This walkthrough designs the HLD that solves all three. By the end you will know why uploads are chunked, why transcoding runs as an asynchronous DAG of worker jobs, why CDN edge nodes pre-warm popular content, and why every video player switches bitrates dynamically based on available bandwidth.

Actors

ActorRole
CreatorUploads raw video; provides title, description, thumbnail
ViewerStreams video on web, mobile, or smart TV
Upload ServiceAccepts chunked uploads; coordinates multipart assembly
Transcoding WorkersCPU-heavy jobs; encode raw video into HLS/DASH segments
CDN Edge NodeCaches and serves video segments; handles 95%+ of playback traffic
Metadata ServiceStores video metadata, channel info, search index
Recommendation EngineGenerates personalised watch-next lists

๐Ÿ” Functional & Non-Functional Requirements

In Scope

  • Upload โ€” chunked, resumable upload of raw video files up to 256 GB
  • Transcoding โ€” async pipeline produces 360p, 480p, 720p, 1080p, and 4K renditions in HLS and DASH formats
  • Storage โ€” raw video in object storage; transcoded segments in object storage; metadata in relational DB
  • Streaming โ€” CDN-delivered adaptive bitrate streaming with < 2 s startup time
  • Search โ€” full-text search across title, description, and transcript
  • View Counting โ€” eventually consistent global view counter; real-time approximate counter

Out of Scope (v1 boundary)

  • Live streaming (RTMP ingest, real-time low-latency delivery)
  • DRM licensing and Widevine/FairPlay key exchange
  • In-video chapters, captions auto-generation, and AI content moderation
  • Ad insertion and monetisation pipeline
  • Real-time comments and Super Chat

Non-Functional Requirements

DimensionTargetWhy It Matters
Upload throughput500+ hours/min ingested globallySustained write load from creators
Playback availability99.99%Rebuffering is visible and costly to user retention
Startup latency< 2 s to first frameIndustry benchmark for streaming UX
Transcoding SLA720p ready within 5 min of upload; 4K within 30 minCreator feedback loop
Concurrent streams50M+ simultaneous viewersSuper Bowl spike sizing
Storage1 EB total; growing at ~10 PB/monthObject storage at scale

โš™๏ธ Capacity Estimation: Translating YouTube-Scale Numbers Into Architecture Choices

Back-of-envelope math directly shapes component sizing decisions. Use these numbers as interview anchors.

Write path (upload + transcode):

  • 500 hours of video uploaded per minute = ~8.3 hours/second
  • Average raw video bitrate: 8 Mbps (1080p source)
  • Ingest bandwidth: 8.3 ร— 3600 ร— 8 Mbps รท 8 โ‰ˆ 30 GB/s aggregate ingestion
  • Transcoding: each minute of 1080p video โ†’ ~5 minutes of CPU time per rendition; 6 renditions โ†’ 30 CPU-minutes per video-minute uploaded โ†’ 900 CPU-minutes/min = 900 vCPUs sustained for transcoding alone

Read path (streaming):

  • 1 billion hours viewed per day รท 86,400 s = ~11.6 million concurrent viewers
  • Average stream bitrate: 2 Mbps (mix of 480p/720p)
  • Egress bandwidth: 11.6M ร— 2 Mbps = 23 Tbps total egress โ€” almost entirely served by CDN, not origin

Storage:

  • 500 hours/min ร— 60 min/hr ร— 24 hr/day = 720,000 hours raw video/day
  • Average compressed raw: 3 GB/hour โ†’ 2.16 PB/day raw ingestion
  • After transcoding to 6 renditions: approximately 3ร— storage multiplier โ†’ 6.5 PB/day added to storage

Key architectural implication: The read-to-write ratio is roughly 1,000:1. CDN caching is not optional โ€” it is the entire delivery strategy. Origin servers would collapse under raw playback load.


๐ŸŽฏ Design Goals Specific to Video Streaming

These are system-specific goals, not generic best practices:

  1. Separate the write path from the read path entirely. Upload and transcoding never touch CDN delivery code. They are decoupled by an async message queue.
  2. Adaptive bitrate prevents buffering for 56% of viewers who experience variable-bandwidth connections (mobile, shared WiFi). The system must produce multiple renditions, not just one.
  3. Transcoding is the only CPU-bound bottleneck. Everything else is I/O-bound. Scale transcoding workers independently from upload gateways and streaming services.
  4. CDN pre-warming for trending content reduces origin load by 40โ€“60% during viral spikes. The system must detect trending videos early and push segments to edge nodes proactively.
  5. View counts are approximate and eventually consistent. Strong consistency at 50M concurrent viewers is neither necessary nor achievable. Redis counters + periodic flush to the database is the correct trade-off.

๐Ÿ“Š High-Level Architecture: The Six Components That Make Streaming Work

The architecture divides cleanly into two planes: the upload/processing plane (left side of the diagram) and the delivery/playback plane (right side). The message queue in the middle is the decoupling boundary that makes each plane independently scalable.

The diagram below shows the complete data flow: a creator's raw upload travels through the Upload Service โ†’ Object Storage โ†’ Transcoding Pipeline, and emerges as HLS/DASH segments pushed to both the origin bucket and CDN. A viewer's play request never touches any of those components โ€” it goes directly to the CDN edge node.

graph TD
    A([Creator Client]) -->|chunked upload| B[Upload Service]
    B -->|multipart PUT| C[(Raw Video Store\nS3 / GCS)]
    C -->|upload complete event| D[Message Queue\nKafka]
    D -->|job dispatch| E[Transcoding Worker Pool\nFFmpeg DAG]
    E -->|HLS/DASH segments| F[(Segment Store\nS3 / GCS)]
    F -->|origin pull| G[CDN Origin Shield]
    G -->|edge replication| H[CDN Edge Nodes\n200+ PoPs]

    I([Viewer Client]) -->|stream request| H
    H -->|cache miss| G
    G -->|segment fetch| F

    B -->|metadata write| J[(Metadata DB\nPostgreSQL)]
    E -->|transcode complete| J
    J <-->|index sync| K[Search Service\nElasticsearch]
    J -->|view event| L[(Redis Cache\nCounters & Hot Metadata)]

Key takeaway: Raw video bytes and playback bytes never share a network path. The Segment Store is the handoff point between the two planes.


๐Ÿง  Deep Dive: Upload, Transcoding, CDN, ABR, and Metadata

The Internals: How the Six Components Interact at Runtime

At runtime, the upload/processing plane and the delivery plane share only two things: the Segment Store (S3 buckets) and the Metadata DB. Every other component is isolated. This separation is intentional โ€” it prevents a transcoding surge from affecting playback latency, and a CDN traffic spike from affecting upload throughput.

State transitions for a video object follow this lifecycle:

StateTriggerOwner
UPLOADINGCreator calls /upload/initUpload Service
TRANSCODINGvideo.uploaded Kafka event consumedTranscoding Orchestrator
READYAll rendition workers complete + manifest generatedTranscoding Orchestrator
FAILEDAny worker hits retry limitDead Letter Queue handler

The message queue (Kafka) is the only durable state boundary between services. If the Transcoding Orchestrator crashes mid-job, Kafka's consumer group offset ensures the job is retried โ€” no state is lost.

Performance Analysis: Throughput, Latency, and Where Each Tier Breaks

TierNormal ThroughputBottleneck TriggerFailure Mode
Upload Service30 GB/s aggregateNetwork saturation per podClient timeout; partial upload stuck
Transcoding Workers900 vCPUs sustainedQueue depth > worker capacityJob backlog; creator SLA breach
Metadata DB50K reads/s (with replicas)Connection exhaustionSlow queries on unpaginated view_events table
CDN Edge23 Tbps egressCache miss storm on viral videoOrigin shield overload without pre-warming
Redis1M+ INCR/s per shardHot key on single viral video IDCounter saturation; shard with video_id suffix

Latency budget for video startup (viewer clicks play โ†’ first frame renders):

DNS resolution:          ~5 ms
CDN TCP connect:         ~15 ms (nearest PoP)
Metadata fetch:          ~20 ms (Redis hit)
master.m3u8 fetch:       ~10 ms (CDN cache hit)
Segment 0 fetch (720p):  ~30 ms (CDN cache hit, 1 MB segment)
Player decode + render:  ~200 ms
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total startup:           ~280 ms  (< 2 s target โœ…)

The 2-second SLA is dominated by player decode and first-segment fetch. If segment 0 is a CDN cache miss (cold start), add 150โ€“300 ms for origin pull โ€” still within budget with origin shield.

Video Upload and Ingestion: Chunked, Resumable, Parallel

Uploading a 10 GB raw video file over a single HTTP connection fails on any real-world connection. The upload service uses resumable chunked upload, identical in concept to S3 multipart upload:

  1. Client calls POST /upload/init โ†’ server returns an uploadId
  2. Client splits file into 5โ€“10 MB chunks and issues parallel PUT /upload/{uploadId}/chunk/{n} requests
  3. On connection drop, client resumes from the last acknowledged chunk index (stored in Redis: upload:progress:{uploadId})
  4. After all chunks are confirmed, client calls POST /upload/{uploadId}/complete โ†’ server triggers S3 CompleteMultipartUpload
  5. Upload Service emits a video.uploaded event to Kafka with {videoId, rawS3Key, creatorId}

This design means a 10 GB upload tolerates network interruptions transparently, chunks upload in parallel (3โ€“5ร— throughput improvement over sequential), and origin assembly happens atomically only when all chunks arrive.

Transcoding Pipeline: A DAG of Encoding Jobs

Transcoding is the most computationally expensive operation in the system. A single 2-hour 4K source file at 60 fps can take 2โ€“4 hours of CPU time to encode into all renditions. The pipeline must be:

  • Asynchronous โ€” never block the upload response on transcoding
  • Parallelisable โ€” encode 720p, 1080p, and 4K simultaneously on different workers
  • DAG-shaped โ€” some steps depend on others (e.g., thumbnail extraction runs after the first 30 seconds are decoded)

The pipeline stages are:

[Raw S3 Object]
       โ”‚
       โ–ผ
[Pre-processing Worker]
  - Validate codec, container, duration
  - Extract thumbnail at t=5s
  - Detect scene boundaries
       โ”‚
       โ”œโ”€โ”€โ–ถ [360p Encoder Worker]  โ†’ segment_360p/  (HLS)
       โ”œโ”€โ”€โ–ถ [480p Encoder Worker]  โ†’ segment_480p/  (HLS)
       โ”œโ”€โ”€โ–ถ [720p Encoder Worker]  โ†’ segment_720p/  (HLS + DASH)
       โ”œโ”€โ”€โ–ถ [1080p Encoder Worker] โ†’ segment_1080p/ (HLS + DASH)
       โ””โ”€โ”€โ–ถ [4K Encoder Worker]   โ†’ segment_2160p/ (HLS + DASH)
                    โ”‚
                    โ–ผ
          [Manifest Generator]
          - Writes master.m3u8 (HLS)
          - Writes manifest.mpd (DASH)
          - Updates metadata DB: status = READY

Each encoder worker is a stateless pod that reads from S3, runs FFmpeg, and writes segments back to S3. Workers are scaled horizontally by pulling from a transcoding_jobs queue in SQS or Kafka. Priority queues route paid creator uploads to faster workers.

CDN and Video Delivery: Why Origin Never Sees Viewer Traffic

A CDN works by replicating content from an origin server (your S3 bucket) to edge nodes (servers co-located in ISP networks worldwide). When a viewer in Mumbai requests segment 720p_seg_042.ts, the request goes to the nearest CDN PoP โ€” typically under 20 ms away โ€” rather than to a data center in Virginia.

Origin shield is an intermediate caching layer between edge nodes and origin. Without it, a cache miss on 200 edge nodes for a newly published video would fire 200 simultaneous origin requests. Origin shield collapses those into one request. Netflix uses a three-tier hierarchy: edge โ†’ region shield โ†’ origin.

Pre-warming for trending content: when a video accumulates >10K views/hour, the system proactively pushes its first 5 minutes of segments to all edge nodes. This eliminates origin load during the steepest part of the viral curve.

Adaptive Bitrate Streaming: How the Player Decides Which Quality to Use

ABR is the mechanism that prevents buffering. The player downloads an HLS master playlist (master.m3u8) that lists every available rendition:

#EXTM3U
#EXT-X-VERSION:3

#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/index.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2000000,RESOLUTION=1280x720
720p/index.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=16000000,RESOLUTION=3840x2160
2160p/index.m3u8

The player selects a starting rendition, then re-evaluates every 4-second segment based on the buffer health algorithm:

if buffer_level < 10s:
    switch DOWN to next lower rendition
elif buffer_level > 30s AND estimated_bandwidth > current_rendition_bitrate * 1.4:
    switch UP to next higher rendition
else:
    stay at current rendition

This simple heuristic is why video quality drops gracefully on mobile instead of freezing. Segment duration (4โ€“6 seconds) controls the granularity of switching โ€” shorter segments allow faster adaptation but increase HTTP request overhead.

Metadata and Search: Keeping Video Info Fast and Searchable

Video metadata (title, description, tags, channel, duration, view count) lives in PostgreSQL. Search lives in Elasticsearch, kept in sync via a Kafka consumer that subscribes to video.published events and indexes the document.

Recommendations use a separate offline pipeline (collaborative filtering, watch history embeddings) that writes pre-computed lists to Redis. The Metadata Service reads from cache first; PostgreSQL is the write authority.

Write Path: Step-by-Step Upload Flow

  1. Creator opens upload dialog โ†’ client calls POST /upload/init with file metadata
  2. Server creates videos row (status = UPLOADING), returns uploadId
  3. Client uploads chunks in parallel; Upload Service assembles into raw S3 object
  4. Upload Service calls POST /upload/{uploadId}/complete โ†’ emits video.uploaded to Kafka
  5. Transcoding Orchestrator consumes event โ†’ fans out encoding jobs to worker queue
  6. Each worker encodes one rendition, writes segments to S3 under {videoId}/{rendition}/
  7. Manifest Generator assembles master.m3u8 and manifest.mpd after all workers complete
  8. Orchestrator updates metadata DB: status = READY, manifest_url, thumbnail_url
  9. Search indexer consumes video.published event โ†’ writes document to Elasticsearch
sequenceDiagram
    participant C as Creator Client
    participant US as Upload Service
    participant S3 as Raw Video Store
    participant MQ as Kafka
    participant TW as Transcoding Workers
    participant DB as Metadata DB

    C->>US: POST /upload/init
    US->>DB: INSERT videos (status=UPLOADING)
    US-->>C: {uploadId}
    C->>US: PUT /upload/{uploadId}/chunk/0..N (parallel)
    US->>S3: CompleteMultipartUpload
    US->>MQ: video.uploaded event
    MQ->>TW: dispatch encoding jobs
    TW->>S3: write HLS/DASH segments
    TW->>MQ: transcode.complete
    MQ->>DB: UPDATE videos (status=READY, manifest_url)

Read Path: Step-by-Step Streaming Flow

  1. Viewer clicks play โ†’ client calls GET /videos/{id} โ†’ Metadata Service returns manifest URL and video metadata from Redis/PostgreSQL
  2. Player fetches master.m3u8 from CDN edge node
  3. Player selects starting rendition (typically 480p) โ†’ fetches 480p/index.m3u8 (segment list)
  4. Player requests segment 0 (480p_seg_000.ts) from CDN edge
  5. CDN hits; serves segment from edge cache in < 20 ms
  6. Player buffers 3 segments (~12 s) before rendering first frame โ†’ startup time < 2 s
  7. Every 4 s, player evaluates buffer health and switches rendition up or down if needed
  8. View event fires after 30 s watch time โ†’ POST /events/view โ†’ Kafka โ†’ Redis INCR counter

๐Ÿ—„๏ธ Data Model: Videos, Channels, Users, and View Counts

CREATE TABLE users (
    user_id       UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    username      TEXT        UNIQUE NOT NULL,
    email         TEXT        UNIQUE NOT NULL,
    channel_id    UUID,
    created_at    TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE channels (
    channel_id    UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    owner_id      UUID        NOT NULL REFERENCES users(user_id),
    name          TEXT        NOT NULL,
    subscriber_count BIGINT   DEFAULT 0,
    created_at    TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE videos (
    video_id      UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    channel_id    UUID        NOT NULL REFERENCES channels(channel_id),
    title         TEXT        NOT NULL,
    description   TEXT,
    status        TEXT        NOT NULL DEFAULT 'UPLOADING',
                              -- UPLOADING | TRANSCODING | READY | FAILED
    duration_s    INT,
    raw_s3_key    TEXT,
    manifest_url  TEXT,       -- CDN URL to master.m3u8
    thumbnail_url TEXT,
    uploaded_at   TIMESTAMPTZ DEFAULT now(),
    published_at  TIMESTAMPTZ
);

CREATE TABLE view_events (
    event_id      UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    video_id      UUID        NOT NULL REFERENCES videos(video_id),
    viewer_id     UUID,       -- NULL for anonymous
    watched_s     INT,        -- seconds watched in this session
    device_type   TEXT,       -- mobile | desktop | tv
    occurred_at   TIMESTAMPTZ DEFAULT now()
);

-- Materialised view counter, flushed from Redis every 5 minutes
CREATE TABLE video_view_counts (
    video_id      UUID        PRIMARY KEY REFERENCES videos(video_id),
    view_count    BIGINT      DEFAULT 0,
    last_updated  TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_videos_channel_id    ON videos(channel_id);
CREATE INDEX idx_videos_published_at  ON videos(published_at DESC) WHERE status = 'READY';
CREATE INDEX idx_view_events_video_id ON view_events(video_id, occurred_at DESC);

NoSQL complement: Video segments themselves are never stored in the relational DB. The manifest_url column is a pointer to the HLS manifest in S3/CDN, which lists segment URLs. This separation is critical โ€” the DB stores metadata; object storage stores content.


โšก Cache Design: Redis Patterns for Hot Video Metadata

Redis serves three distinct caching roles in the video platform:

Cache Key PatternTypeTTLPurpose
video:meta:{videoId}Hash10 minVideo title, thumbnail URL, duration, channel name
video:views:{videoId}String (counter)No TTLReal-time view counter; flushed to DB every 5 min
trending:videos:{region}Sorted Set1 hourTop 100 trending video IDs sorted by velocity
channel:meta:{channelId}Hash30 minChannel name, subscriber count, avatar URL
rec:home:{userId}List15 minPre-computed recommendation list per user

View counter pattern โ€” Redis INCR is atomic and handles 1M+ increments/second on a single instance. A background job flushes counters to PostgreSQL every 5 minutes using GETSET (read current value and reset to 0 atomically):

# On every qualified view event (30s+ watched):
INCR video:views:{videoId}

# Background flush job (every 5 min):
count = GETSET video:views:{videoId} 0
UPDATE video_view_counts SET view_count = view_count + count WHERE video_id = ?

Trending detection โ€” a sliding-window job computes view velocity (views per hour) for all videos and updates the trending:videos:{region} sorted set every 15 minutes. CDN pre-warm jobs subscribe to this set.


โš–๏ธ Trade-offs and Failure Modes: Bottlenecks at Video-Streaming Scale

ComponentBottleneckScaling Strategy
Upload ServiceNetwork bandwidth per podHorizontal scaling behind ALB; direct-to-S3 multipart upload to bypass app server for data
Transcoding WorkersCPU โ€” encoding is compute-boundAutoscale worker pool based on queue depth; GPU instances for H.265/AV1; spot instances for cost
Metadata DB (PostgreSQL)Read query throughputRead replicas for video page loads; connection pooling via PgBouncer; partition view_events by month
Search (Elasticsearch)Index lag on viral uploadsPriority Kafka consumer for trending channels; shard by channel_id
CDNCache miss storm on viral uploadOrigin shield collapses parallel edge misses to single origin fetch; pre-warm pipeline for trending videos
View CountingWrite amplification at 50M concurrent viewersRedis INCR (never write to PostgreSQL on hot path); periodic batch flush
Recommendation ServiceStaleness of pre-computed listsRebuild lists on publish + on view-count threshold crossing; short TTL for active users

๐Ÿ› ๏ธ FFmpeg: How It Powers Video Transcoding

FFmpeg is the open-source multimedia framework that runs at the heart of every major video platform's transcoding pipeline. YouTube, Facebook, Twitter, and Twitch all use it (directly or via wrappers) to encode video into web-deliverable formats.

What FFmpeg does in this pipeline:

  • Decodes the raw source file (any codec: H.264, HEVC, ProRes, DNxHD)
  • Re-encodes to the target resolution and bitrate ladder
  • Segments the output into 4-second HLS .ts files
  • Generates the index.m3u8 segment playlist for each rendition

Minimal HLS segmentation command used by a transcoding worker:

ffmpeg \
  -i input.mp4 \
  -vf scale=1280:720 \
  -c:v libx264 \
  -b:v 2000k \
  -c:a aac \
  -b:a 128k \
  -hls_time 4 \
  -hls_playlist_type vod \
  -hls_segment_filename "segment_%03d.ts" \
  720p/index.m3u8

This single command produces segment_000.ts, segment_001.ts, โ€ฆ and a complete 720p/index.m3u8 playlist. Running six of these commands in parallel (one per rendition) on separate worker pods is exactly how the transcoding DAG executes.

AV1 encoding is the next-generation codec (40% better compression than H.264 at equal quality) but requires 10โ€“20ร— more CPU time to encode. Most platforms use AV1 for long-tail, cold storage renditions and H.264/H.265 for the initial fast renditions.

For a full FFmpeg deep-dive covering codec selection, two-pass encoding, and hardware acceleration, see the companion post linked in Related Posts.



๐ŸŒ Real-World Applications: How YouTube and Netflix Deploy This Architecture in Practice

Understanding how the two largest streaming platforms diverge on the same HLD is the most instructive case study available.

YouTube handles the widest variety of source material in the world โ€” 500 hours/min from 2 billion creators using phones, webcams, and professional cameras. YouTube's key adaptations:

  • Variable bitrate ladder: shorter videos (< 5 min) are encoded at fewer renditions to reduce transcoding cost; longer videos always get 4K.
  • ABR is client-side: the YouTube player runs Google's proprietary BOLA algorithm (Buffer Occupancy based Lyapunov Algorithm) rather than a simple heuristic, balancing rebuffer rate and quality switches.
  • Colossus + Bigtable: video metadata and chunk indices live in Google's internal distributed file system; external HLD approximates this with S3 + PostgreSQL.

Netflix serves a smaller catalog (~17,000 titles) but at extreme fidelity (Dolby Vision, Atmos). Netflix's key adaptations:

  • Open Connect: Netflix operates its own CDN โ€” 17,000+ physical appliances embedded directly inside ISP networks in 1,000+ locations worldwide. This eliminates third-party CDN egress costs entirely and gives Netflix direct control over cache warming.
  • Per-title encoding: instead of a fixed bitrate ladder, Netflix uses a convex hull optimisation algorithm that analyses each title's scene complexity and generates a custom bitrate-resolution ladder. An animated film needs far fewer bits per pixel than a dark action scene.
  • Studio-to-stream pipeline: raw files arrive from studios as ProRes 4K masters; Netflix's transcoding clusters run 24/7 to produce 1,200+ rendition files per title (every combination of resolution, bitrate, codec, language, and subtitle track).
DimensionYouTubeNetflix
CDNThird-party + Google edge PoPsProprietary Open Connect appliances
Source variety500 hrs/min, any codec~17K titles, studio masters
Bitrate ladderFixed per resolution tierPer-title convex hull optimisation
ABR algorithmBOLA (Lyapunov-based)Client-side + server-side hybrid
TranscodingCloud workers (GCP)On-premise + cloud hybrid

๐Ÿงญ Architecture Decision Guide: Choosing the Right Component

SituationRecommendation
Use HLS/DASH over RTMPFor VoD playback. RTMP is a live ingest protocol โ€” it has no concept of multi-rendition manifests or CDN segment caching. HLS/DASH are the only formats with ABR and broad device support.
Use async transcoding over syncAlways for VoD. Sync transcoding blocks the upload response for 30+ minutes on a 2-hour video. Async queue + worker pool decouples creator UX from encoding latency.
Use CDN edge over proxying from originFor all segment delivery. Origin servers cannot serve 23 Tbps. CDN edge nodes are the architecture, not an optimisation.
Use Redis INCR over DB INSERT for view countsWhen write rate > 10K/s. At 50M concurrent viewers firing view events, a direct PostgreSQL write would exceed connection and IOPS limits. Redis handles 1M+ atomic increments/second.
Split transcoding workers by renditionAlways. Encoding 4K and 360p in the same job means the 360p rendition (needed in minutes) waits for 4K (needs 30 min). Separate worker pools with separate priority queues solve this.
Avoid storing video bytes in the Metadata DBThe DB stores manifest URLs only. Storing segments in a relational DB creates a hard ceiling on throughput and storage cost that object storage does not have.

๐Ÿงช Practical Example: Tracing a Single Video from Camera to Viewer

Walk through a concrete upload โ€” a creator uploads a 10-minute 1080p cooking tutorial recorded on an iPhone:

Upload phase (T+0 to T+45 seconds):

  1. iPhone YouTube app calls POST /upload/init โ†’ server returns uploadId = abc123
  2. App splits 1.2 GB MP4 into 120 ร— 10 MB chunks; uploads 5 chunks in parallel
  3. Upload Service writes each chunk to S3 multipart; acknowledges each with HTTP 200
  4. At T+40 s, all 120 chunks acknowledged; app calls POST /upload/abc123/complete
  5. Upload Service calls S3 CompleteMultipartUpload; emits video.uploaded to Kafka

Transcoding phase (T+45 s to T+5 min for 720p; T+45 s to T+25 min for 4K):

  1. Transcoding Orchestrator consumes Kafka event; dispatches 6 encoding jobs to worker pool
  2. 360p and 480p workers finish in ~2 min; 720p worker finishes in ~4 min
  3. Metadata DB updated: status = PARTIALLY_READY, manifest_url written with 360p/480p/720p available
  4. Creator's Studio dashboard shows "Processing โ€” 720p available" at T+5 min
  5. 1080p worker completes at T+15 min; 4K at T+25 min; status โ†’ READY

First viewer plays the video (T+6 min):

  1. Viewer clicks video thumbnail โ†’ GET /videos/{videoId} โ†’ metadata from Redis (cache hit, 2 ms)
  2. Player fetches master.m3u8 from CDN โ†’ cache miss (video is 1 min old) โ†’ CDN pulls from origin โ†’ caches segment
  3. Player selects 720p (bandwidth estimate 5 Mbps) โ†’ fetches 720p/index.m3u8
  4. Player downloads segments 0, 1, 2 (12 s buffer); renders first frame at T+280 ms from click
  5. At T+35 s of watch time, view event fires โ†’ INCR video:views:{videoId} in Redis

One hour later (viral moment โ€” 50K concurrent viewers):

  1. CDN hit rate for 720p segments: 99.8% (all cached at nearest PoP)
  2. Trending job detects view velocity โ†’ adds video to trending:videos:US sorted set
  3. CDN pre-warm job pushes all segments of first 5 minutes to all North American PoPs
  4. Origin receives ~200 requests/hour total (only for cache misses on long-tail segments)
  5. Redis video:views:{videoId} accumulates; background job flushes 50K views to PostgreSQL every 5 min

๐Ÿ“š Lessons Learned from YouTube-Scale Streaming

  1. Never proxy video bytes through application servers. Every Upload Service and Download Service should redirect clients to pre-signed S3/CDN URLs. Application servers die immediately under video-scale bandwidth โ€” their job is to orchestrate URLs, not move bytes.

  2. Transcoding is a first-class distributed system. It has its own queue, its own worker autoscaler, its own retry logic (a worker crash mid-encode must not lose the job), and its own priority tiers. Treating it as a simple background job is the most common design mistake.

  3. Ship 720p first, 4K later. Viewers start watching within minutes of upload. The transcoding pipeline must produce a "fast path" rendition (480p or 720p) quickly, then continue encoding higher renditions asynchronously. This is a priority queue problem, not just a speed problem.

  4. CDN hit rate is your #1 cost lever. At 23 Tbps egress, a 1% improvement in CDN hit rate saves roughly $2M/year in origin data transfer costs. Cache key design, segment duration tuning, and pre-warming are all worth engineering investment.

  5. View counts don't need to be exact. No viewer notices whether a video has 1,024,819 or 1,025,000 views. Eventual consistency via Redis + periodic flush is the correct trade-off. Strong consistency here wastes database write capacity that is needed for real user-facing operations.

  6. Origin shield is mandatory, not optional. Without it, publishing a video to 200 edge PoPs causes 200 simultaneous S3 GETs on first cache miss. With origin shield, it collapses to one. This is not an optimisation โ€” it is an architectural requirement above 10M concurrent viewers.


๐Ÿ“Œ TLDR Summary & Key Takeaways: The Five Decisions That Define Video Streaming Architecture

  • Chunked resumable upload solves the reliability problem for large file ingestion across any network quality.
  • Async transcoding DAG with worker pools decouples encoding latency from upload responsiveness and scales independently.
  • HLS/DASH multi-rendition segmentation is the prerequisite for adaptive bitrate โ€” without multiple renditions, ABR is impossible.
  • CDN with origin shield + pre-warming shifts 99% of egress traffic to edge nodes, making origin servers irrelevant to playback latency.
  • Redis counters with periodic DB flush solve the high-write view counting problem without sacrificing scale or consistency for non-critical data.

One-liner to remember: Upload pipeline writes segments; CDN edge nodes read them. Those two paths never cross.


๐Ÿ“ Practice Quiz

  1. A creator uploads a 2-hour raw video. Why does the Upload Service use chunked multipart upload rather than a single HTTP PUT?

    • A) Single PUT is not supported by S3
    • B) Chunked upload enables parallel transfer, resumable recovery on failure, and avoids timeouts on large files
    • C) Chunked upload is required for HLS segmentation
    • D) To avoid paying per-byte CDN fees on upload

    Correct Answer: B

  2. Your transcoding pipeline takes 45 minutes to produce a 4K rendition, but viewers expect to watch within 2 minutes of upload. What is the correct architectural fix?

    • A) Use faster CPU instances to reduce total encode time to 2 minutes
    • B) Block the upload response until at least 720p is ready
    • C) Implement a priority fast-path: encode 480p or 720p first, then queue higher renditions asynchronously
    • D) Use RTMP instead of HLS to avoid transcoding entirely

    Correct Answer: C

  3. A viral video gets 5 million views in the first hour. Your origin S3 bucket starts returning 503 errors. Which component should have prevented this?

    • A) A larger PostgreSQL read replica for video metadata
    • B) CDN origin shield collapsing parallel edge cache misses to a single origin fetch
    • C) More Upload Service pods behind the load balancer
    • D) Redis sorted set for trending videos

    Correct Answer: B

  4. Why do video streaming platforms use HLS or DASH instead of progressively downloading a single MP4 file?

    • A) MP4 is not supported by modern browsers
    • B) HLS/DASH segment playlists allow adaptive bitrate switching per 4-second segment, which MP4 download cannot do
    • C) HLS and DASH use less storage than MP4
    • D) MP4 cannot be stored in S3

    Correct Answer: B

  5. Open-ended: Your product team wants real-time exact view counts displayed to creators within 1 second of each view. How would you redesign the view counting system, and what trade-offs would you accept? Consider write throughput, consistency, and cost in your answer.



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms