System Design HLD Example: Video Streaming (YouTube/Netflix)
A practical interview-ready HLD for a video streaming platform with adaptive bitrate and CDN delivery.
Abstract AlgorithmsTLDR: A video streaming platform is two separate systems stitched together โ a batch-oriented transcoding pipeline that converts raw uploads into multi-resolution HLS/DASH segments, and a real-time delivery network of CDN edge nodes that serve those segments with sub-100 ms latency worldwide. Nail the boundary between them and the rest of the design falls into place.
๐ The Scale Problem: 500 Hours of Video Per Minute
Every minute, users upload 500 hours of video to YouTube. Netflix streams to 238 million subscribers simultaneously. The hard part is not storing the bytes โ object storage handles that. The real challenge is everything in between: accepting a raw 4K MOV file, re-encoding it into seven resolutions and two streaming formats, distributing those segments to 200+ CDN edge locations worldwide, and then serving the right resolution to a viewer on a slow 3G phone without a single rebuffer.
Three separate hard problems are hiding inside that one sentence:
- Ingestion at write time: how do you accept gigabyte-scale uploads reliably from any device or connection quality?
- Processing at encode time: transcoding 4K video is CPU-intensive and takes minutes โ how do you scale it without blocking uploads?
- Delivery at read time: how do you serve millions of concurrent streams from a file that lives in one origin bucket?
This walkthrough designs the HLD that solves all three. By the end you will know why uploads are chunked, why transcoding runs as an asynchronous DAG of worker jobs, why CDN edge nodes pre-warm popular content, and why every video player switches bitrates dynamically based on available bandwidth.
Actors
| Actor | Role |
| Creator | Uploads raw video; provides title, description, thumbnail |
| Viewer | Streams video on web, mobile, or smart TV |
| Upload Service | Accepts chunked uploads; coordinates multipart assembly |
| Transcoding Workers | CPU-heavy jobs; encode raw video into HLS/DASH segments |
| CDN Edge Node | Caches and serves video segments; handles 95%+ of playback traffic |
| Metadata Service | Stores video metadata, channel info, search index |
| Recommendation Engine | Generates personalised watch-next lists |
๐ Functional & Non-Functional Requirements
In Scope
- Upload โ chunked, resumable upload of raw video files up to 256 GB
- Transcoding โ async pipeline produces 360p, 480p, 720p, 1080p, and 4K renditions in HLS and DASH formats
- Storage โ raw video in object storage; transcoded segments in object storage; metadata in relational DB
- Streaming โ CDN-delivered adaptive bitrate streaming with < 2 s startup time
- Search โ full-text search across title, description, and transcript
- View Counting โ eventually consistent global view counter; real-time approximate counter
Out of Scope (v1 boundary)
- Live streaming (RTMP ingest, real-time low-latency delivery)
- DRM licensing and Widevine/FairPlay key exchange
- In-video chapters, captions auto-generation, and AI content moderation
- Ad insertion and monetisation pipeline
- Real-time comments and Super Chat
Non-Functional Requirements
| Dimension | Target | Why It Matters |
| Upload throughput | 500+ hours/min ingested globally | Sustained write load from creators |
| Playback availability | 99.99% | Rebuffering is visible and costly to user retention |
| Startup latency | < 2 s to first frame | Industry benchmark for streaming UX |
| Transcoding SLA | 720p ready within 5 min of upload; 4K within 30 min | Creator feedback loop |
| Concurrent streams | 50M+ simultaneous viewers | Super Bowl spike sizing |
| Storage | 1 EB total; growing at ~10 PB/month | Object storage at scale |
โ๏ธ Capacity Estimation: Translating YouTube-Scale Numbers Into Architecture Choices
Back-of-envelope math directly shapes component sizing decisions. Use these numbers as interview anchors.
Write path (upload + transcode):
- 500 hours of video uploaded per minute = ~8.3 hours/second
- Average raw video bitrate: 8 Mbps (1080p source)
- Ingest bandwidth: 8.3 ร 3600 ร 8 Mbps รท 8 โ 30 GB/s aggregate ingestion
- Transcoding: each minute of 1080p video โ ~5 minutes of CPU time per rendition; 6 renditions โ 30 CPU-minutes per video-minute uploaded โ 900 CPU-minutes/min = 900 vCPUs sustained for transcoding alone
Read path (streaming):
- 1 billion hours viewed per day รท 86,400 s = ~11.6 million concurrent viewers
- Average stream bitrate: 2 Mbps (mix of 480p/720p)
- Egress bandwidth: 11.6M ร 2 Mbps = 23 Tbps total egress โ almost entirely served by CDN, not origin
Storage:
- 500 hours/min ร 60 min/hr ร 24 hr/day = 720,000 hours raw video/day
- Average compressed raw: 3 GB/hour โ 2.16 PB/day raw ingestion
- After transcoding to 6 renditions: approximately 3ร storage multiplier โ 6.5 PB/day added to storage
Key architectural implication: The read-to-write ratio is roughly 1,000:1. CDN caching is not optional โ it is the entire delivery strategy. Origin servers would collapse under raw playback load.
๐ฏ Design Goals Specific to Video Streaming
These are system-specific goals, not generic best practices:
- Separate the write path from the read path entirely. Upload and transcoding never touch CDN delivery code. They are decoupled by an async message queue.
- Adaptive bitrate prevents buffering for 56% of viewers who experience variable-bandwidth connections (mobile, shared WiFi). The system must produce multiple renditions, not just one.
- Transcoding is the only CPU-bound bottleneck. Everything else is I/O-bound. Scale transcoding workers independently from upload gateways and streaming services.
- CDN pre-warming for trending content reduces origin load by 40โ60% during viral spikes. The system must detect trending videos early and push segments to edge nodes proactively.
- View counts are approximate and eventually consistent. Strong consistency at 50M concurrent viewers is neither necessary nor achievable. Redis counters + periodic flush to the database is the correct trade-off.
๐ High-Level Architecture: The Six Components That Make Streaming Work
The architecture divides cleanly into two planes: the upload/processing plane (left side of the diagram) and the delivery/playback plane (right side). The message queue in the middle is the decoupling boundary that makes each plane independently scalable.
The diagram below shows the complete data flow: a creator's raw upload travels through the Upload Service โ Object Storage โ Transcoding Pipeline, and emerges as HLS/DASH segments pushed to both the origin bucket and CDN. A viewer's play request never touches any of those components โ it goes directly to the CDN edge node.
graph TD
A([Creator Client]) -->|chunked upload| B[Upload Service]
B -->|multipart PUT| C[(Raw Video Store\nS3 / GCS)]
C -->|upload complete event| D[Message Queue\nKafka]
D -->|job dispatch| E[Transcoding Worker Pool\nFFmpeg DAG]
E -->|HLS/DASH segments| F[(Segment Store\nS3 / GCS)]
F -->|origin pull| G[CDN Origin Shield]
G -->|edge replication| H[CDN Edge Nodes\n200+ PoPs]
I([Viewer Client]) -->|stream request| H
H -->|cache miss| G
G -->|segment fetch| F
B -->|metadata write| J[(Metadata DB\nPostgreSQL)]
E -->|transcode complete| J
J <-->|index sync| K[Search Service\nElasticsearch]
J -->|view event| L[(Redis Cache\nCounters & Hot Metadata)]
Key takeaway: Raw video bytes and playback bytes never share a network path. The Segment Store is the handoff point between the two planes.
๐ง Deep Dive: Upload, Transcoding, CDN, ABR, and Metadata
The Internals: How the Six Components Interact at Runtime
At runtime, the upload/processing plane and the delivery plane share only two things: the Segment Store (S3 buckets) and the Metadata DB. Every other component is isolated. This separation is intentional โ it prevents a transcoding surge from affecting playback latency, and a CDN traffic spike from affecting upload throughput.
State transitions for a video object follow this lifecycle:
| State | Trigger | Owner |
UPLOADING | Creator calls /upload/init | Upload Service |
TRANSCODING | video.uploaded Kafka event consumed | Transcoding Orchestrator |
READY | All rendition workers complete + manifest generated | Transcoding Orchestrator |
FAILED | Any worker hits retry limit | Dead Letter Queue handler |
The message queue (Kafka) is the only durable state boundary between services. If the Transcoding Orchestrator crashes mid-job, Kafka's consumer group offset ensures the job is retried โ no state is lost.
Performance Analysis: Throughput, Latency, and Where Each Tier Breaks
| Tier | Normal Throughput | Bottleneck Trigger | Failure Mode |
| Upload Service | 30 GB/s aggregate | Network saturation per pod | Client timeout; partial upload stuck |
| Transcoding Workers | 900 vCPUs sustained | Queue depth > worker capacity | Job backlog; creator SLA breach |
| Metadata DB | 50K reads/s (with replicas) | Connection exhaustion | Slow queries on unpaginated view_events table |
| CDN Edge | 23 Tbps egress | Cache miss storm on viral video | Origin shield overload without pre-warming |
| Redis | 1M+ INCR/s per shard | Hot key on single viral video ID | Counter saturation; shard with video_id suffix |
Latency budget for video startup (viewer clicks play โ first frame renders):
DNS resolution: ~5 ms
CDN TCP connect: ~15 ms (nearest PoP)
Metadata fetch: ~20 ms (Redis hit)
master.m3u8 fetch: ~10 ms (CDN cache hit)
Segment 0 fetch (720p): ~30 ms (CDN cache hit, 1 MB segment)
Player decode + render: ~200 ms
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total startup: ~280 ms (< 2 s target โ
)
The 2-second SLA is dominated by player decode and first-segment fetch. If segment 0 is a CDN cache miss (cold start), add 150โ300 ms for origin pull โ still within budget with origin shield.
Video Upload and Ingestion: Chunked, Resumable, Parallel
Uploading a 10 GB raw video file over a single HTTP connection fails on any real-world connection. The upload service uses resumable chunked upload, identical in concept to S3 multipart upload:
- Client calls
POST /upload/initโ server returns anuploadId - Client splits file into 5โ10 MB chunks and issues parallel
PUT /upload/{uploadId}/chunk/{n}requests - On connection drop, client resumes from the last acknowledged chunk index (stored in Redis:
upload:progress:{uploadId}) - After all chunks are confirmed, client calls
POST /upload/{uploadId}/completeโ server triggers S3CompleteMultipartUpload - Upload Service emits a
video.uploadedevent to Kafka with{videoId, rawS3Key, creatorId}
This design means a 10 GB upload tolerates network interruptions transparently, chunks upload in parallel (3โ5ร throughput improvement over sequential), and origin assembly happens atomically only when all chunks arrive.
Transcoding Pipeline: A DAG of Encoding Jobs
Transcoding is the most computationally expensive operation in the system. A single 2-hour 4K source file at 60 fps can take 2โ4 hours of CPU time to encode into all renditions. The pipeline must be:
- Asynchronous โ never block the upload response on transcoding
- Parallelisable โ encode 720p, 1080p, and 4K simultaneously on different workers
- DAG-shaped โ some steps depend on others (e.g., thumbnail extraction runs after the first 30 seconds are decoded)
The pipeline stages are:
[Raw S3 Object]
โ
โผ
[Pre-processing Worker]
- Validate codec, container, duration
- Extract thumbnail at t=5s
- Detect scene boundaries
โ
โโโโถ [360p Encoder Worker] โ segment_360p/ (HLS)
โโโโถ [480p Encoder Worker] โ segment_480p/ (HLS)
โโโโถ [720p Encoder Worker] โ segment_720p/ (HLS + DASH)
โโโโถ [1080p Encoder Worker] โ segment_1080p/ (HLS + DASH)
โโโโถ [4K Encoder Worker] โ segment_2160p/ (HLS + DASH)
โ
โผ
[Manifest Generator]
- Writes master.m3u8 (HLS)
- Writes manifest.mpd (DASH)
- Updates metadata DB: status = READY
Each encoder worker is a stateless pod that reads from S3, runs FFmpeg, and writes segments back to S3. Workers are scaled horizontally by pulling from a transcoding_jobs queue in SQS or Kafka. Priority queues route paid creator uploads to faster workers.
CDN and Video Delivery: Why Origin Never Sees Viewer Traffic
A CDN works by replicating content from an origin server (your S3 bucket) to edge nodes (servers co-located in ISP networks worldwide). When a viewer in Mumbai requests segment 720p_seg_042.ts, the request goes to the nearest CDN PoP โ typically under 20 ms away โ rather than to a data center in Virginia.
Origin shield is an intermediate caching layer between edge nodes and origin. Without it, a cache miss on 200 edge nodes for a newly published video would fire 200 simultaneous origin requests. Origin shield collapses those into one request. Netflix uses a three-tier hierarchy: edge โ region shield โ origin.
Pre-warming for trending content: when a video accumulates >10K views/hour, the system proactively pushes its first 5 minutes of segments to all edge nodes. This eliminates origin load during the steepest part of the viral curve.
Adaptive Bitrate Streaming: How the Player Decides Which Quality to Use
ABR is the mechanism that prevents buffering. The player downloads an HLS master playlist (master.m3u8) that lists every available rendition:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2000000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=16000000,RESOLUTION=3840x2160
2160p/index.m3u8
The player selects a starting rendition, then re-evaluates every 4-second segment based on the buffer health algorithm:
if buffer_level < 10s:
switch DOWN to next lower rendition
elif buffer_level > 30s AND estimated_bandwidth > current_rendition_bitrate * 1.4:
switch UP to next higher rendition
else:
stay at current rendition
This simple heuristic is why video quality drops gracefully on mobile instead of freezing. Segment duration (4โ6 seconds) controls the granularity of switching โ shorter segments allow faster adaptation but increase HTTP request overhead.
Metadata and Search: Keeping Video Info Fast and Searchable
Video metadata (title, description, tags, channel, duration, view count) lives in PostgreSQL. Search lives in Elasticsearch, kept in sync via a Kafka consumer that subscribes to video.published events and indexes the document.
Recommendations use a separate offline pipeline (collaborative filtering, watch history embeddings) that writes pre-computed lists to Redis. The Metadata Service reads from cache first; PostgreSQL is the write authority.
Write Path: Step-by-Step Upload Flow
- Creator opens upload dialog โ client calls
POST /upload/initwith file metadata - Server creates
videosrow (status =UPLOADING), returnsuploadId - Client uploads chunks in parallel; Upload Service assembles into raw S3 object
- Upload Service calls
POST /upload/{uploadId}/completeโ emitsvideo.uploadedto Kafka - Transcoding Orchestrator consumes event โ fans out encoding jobs to worker queue
- Each worker encodes one rendition, writes segments to S3 under
{videoId}/{rendition}/ - Manifest Generator assembles
master.m3u8andmanifest.mpdafter all workers complete - Orchestrator updates metadata DB:
status = READY,manifest_url,thumbnail_url - Search indexer consumes
video.publishedevent โ writes document to Elasticsearch
sequenceDiagram
participant C as Creator Client
participant US as Upload Service
participant S3 as Raw Video Store
participant MQ as Kafka
participant TW as Transcoding Workers
participant DB as Metadata DB
C->>US: POST /upload/init
US->>DB: INSERT videos (status=UPLOADING)
US-->>C: {uploadId}
C->>US: PUT /upload/{uploadId}/chunk/0..N (parallel)
US->>S3: CompleteMultipartUpload
US->>MQ: video.uploaded event
MQ->>TW: dispatch encoding jobs
TW->>S3: write HLS/DASH segments
TW->>MQ: transcode.complete
MQ->>DB: UPDATE videos (status=READY, manifest_url)
Read Path: Step-by-Step Streaming Flow
- Viewer clicks play โ client calls
GET /videos/{id}โ Metadata Service returns manifest URL and video metadata from Redis/PostgreSQL - Player fetches
master.m3u8from CDN edge node - Player selects starting rendition (typically 480p) โ fetches
480p/index.m3u8(segment list) - Player requests segment 0 (
480p_seg_000.ts) from CDN edge - CDN hits; serves segment from edge cache in < 20 ms
- Player buffers 3 segments (~12 s) before rendering first frame โ startup time < 2 s
- Every 4 s, player evaluates buffer health and switches rendition up or down if needed
- View event fires after 30 s watch time โ
POST /events/viewโ Kafka โ RedisINCRcounter
๐๏ธ Data Model: Videos, Channels, Users, and View Counts
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username TEXT UNIQUE NOT NULL,
email TEXT UNIQUE NOT NULL,
channel_id UUID,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE channels (
channel_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
owner_id UUID NOT NULL REFERENCES users(user_id),
name TEXT NOT NULL,
subscriber_count BIGINT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE videos (
video_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
channel_id UUID NOT NULL REFERENCES channels(channel_id),
title TEXT NOT NULL,
description TEXT,
status TEXT NOT NULL DEFAULT 'UPLOADING',
-- UPLOADING | TRANSCODING | READY | FAILED
duration_s INT,
raw_s3_key TEXT,
manifest_url TEXT, -- CDN URL to master.m3u8
thumbnail_url TEXT,
uploaded_at TIMESTAMPTZ DEFAULT now(),
published_at TIMESTAMPTZ
);
CREATE TABLE view_events (
event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
video_id UUID NOT NULL REFERENCES videos(video_id),
viewer_id UUID, -- NULL for anonymous
watched_s INT, -- seconds watched in this session
device_type TEXT, -- mobile | desktop | tv
occurred_at TIMESTAMPTZ DEFAULT now()
);
-- Materialised view counter, flushed from Redis every 5 minutes
CREATE TABLE video_view_counts (
video_id UUID PRIMARY KEY REFERENCES videos(video_id),
view_count BIGINT DEFAULT 0,
last_updated TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_videos_channel_id ON videos(channel_id);
CREATE INDEX idx_videos_published_at ON videos(published_at DESC) WHERE status = 'READY';
CREATE INDEX idx_view_events_video_id ON view_events(video_id, occurred_at DESC);
NoSQL complement: Video segments themselves are never stored in the relational DB. The manifest_url column is a pointer to the HLS manifest in S3/CDN, which lists segment URLs. This separation is critical โ the DB stores metadata; object storage stores content.
โก Cache Design: Redis Patterns for Hot Video Metadata
Redis serves three distinct caching roles in the video platform:
| Cache Key Pattern | Type | TTL | Purpose |
video:meta:{videoId} | Hash | 10 min | Video title, thumbnail URL, duration, channel name |
video:views:{videoId} | String (counter) | No TTL | Real-time view counter; flushed to DB every 5 min |
trending:videos:{region} | Sorted Set | 1 hour | Top 100 trending video IDs sorted by velocity |
channel:meta:{channelId} | Hash | 30 min | Channel name, subscriber count, avatar URL |
rec:home:{userId} | List | 15 min | Pre-computed recommendation list per user |
View counter pattern โ Redis INCR is atomic and handles 1M+ increments/second on a single instance. A background job flushes counters to PostgreSQL every 5 minutes using GETSET (read current value and reset to 0 atomically):
# On every qualified view event (30s+ watched):
INCR video:views:{videoId}
# Background flush job (every 5 min):
count = GETSET video:views:{videoId} 0
UPDATE video_view_counts SET view_count = view_count + count WHERE video_id = ?
Trending detection โ a sliding-window job computes view velocity (views per hour) for all videos and updates the trending:videos:{region} sorted set every 15 minutes. CDN pre-warm jobs subscribe to this set.
โ๏ธ Trade-offs and Failure Modes: Bottlenecks at Video-Streaming Scale
| Component | Bottleneck | Scaling Strategy |
| Upload Service | Network bandwidth per pod | Horizontal scaling behind ALB; direct-to-S3 multipart upload to bypass app server for data |
| Transcoding Workers | CPU โ encoding is compute-bound | Autoscale worker pool based on queue depth; GPU instances for H.265/AV1; spot instances for cost |
| Metadata DB (PostgreSQL) | Read query throughput | Read replicas for video page loads; connection pooling via PgBouncer; partition view_events by month |
| Search (Elasticsearch) | Index lag on viral uploads | Priority Kafka consumer for trending channels; shard by channel_id |
| CDN | Cache miss storm on viral upload | Origin shield collapses parallel edge misses to single origin fetch; pre-warm pipeline for trending videos |
| View Counting | Write amplification at 50M concurrent viewers | Redis INCR (never write to PostgreSQL on hot path); periodic batch flush |
| Recommendation Service | Staleness of pre-computed lists | Rebuild lists on publish + on view-count threshold crossing; short TTL for active users |
๐ ๏ธ FFmpeg: How It Powers Video Transcoding
FFmpeg is the open-source multimedia framework that runs at the heart of every major video platform's transcoding pipeline. YouTube, Facebook, Twitter, and Twitch all use it (directly or via wrappers) to encode video into web-deliverable formats.
What FFmpeg does in this pipeline:
- Decodes the raw source file (any codec: H.264, HEVC, ProRes, DNxHD)
- Re-encodes to the target resolution and bitrate ladder
- Segments the output into 4-second HLS
.tsfiles - Generates the
index.m3u8segment playlist for each rendition
Minimal HLS segmentation command used by a transcoding worker:
ffmpeg \
-i input.mp4 \
-vf scale=1280:720 \
-c:v libx264 \
-b:v 2000k \
-c:a aac \
-b:a 128k \
-hls_time 4 \
-hls_playlist_type vod \
-hls_segment_filename "segment_%03d.ts" \
720p/index.m3u8
This single command produces segment_000.ts, segment_001.ts, โฆ and a complete 720p/index.m3u8 playlist. Running six of these commands in parallel (one per rendition) on separate worker pods is exactly how the transcoding DAG executes.
AV1 encoding is the next-generation codec (40% better compression than H.264 at equal quality) but requires 10โ20ร more CPU time to encode. Most platforms use AV1 for long-tail, cold storage renditions and H.264/H.265 for the initial fast renditions.
For a full FFmpeg deep-dive covering codec selection, two-pass encoding, and hardware acceleration, see the companion post linked in Related Posts.
๐ Real-World Applications: How YouTube and Netflix Deploy This Architecture in Practice
Understanding how the two largest streaming platforms diverge on the same HLD is the most instructive case study available.
YouTube handles the widest variety of source material in the world โ 500 hours/min from 2 billion creators using phones, webcams, and professional cameras. YouTube's key adaptations:
- Variable bitrate ladder: shorter videos (< 5 min) are encoded at fewer renditions to reduce transcoding cost; longer videos always get 4K.
- ABR is client-side: the YouTube player runs Google's proprietary BOLA algorithm (Buffer Occupancy based Lyapunov Algorithm) rather than a simple heuristic, balancing rebuffer rate and quality switches.
- Colossus + Bigtable: video metadata and chunk indices live in Google's internal distributed file system; external HLD approximates this with S3 + PostgreSQL.
Netflix serves a smaller catalog (~17,000 titles) but at extreme fidelity (Dolby Vision, Atmos). Netflix's key adaptations:
- Open Connect: Netflix operates its own CDN โ 17,000+ physical appliances embedded directly inside ISP networks in 1,000+ locations worldwide. This eliminates third-party CDN egress costs entirely and gives Netflix direct control over cache warming.
- Per-title encoding: instead of a fixed bitrate ladder, Netflix uses a convex hull optimisation algorithm that analyses each title's scene complexity and generates a custom bitrate-resolution ladder. An animated film needs far fewer bits per pixel than a dark action scene.
- Studio-to-stream pipeline: raw files arrive from studios as ProRes 4K masters; Netflix's transcoding clusters run 24/7 to produce 1,200+ rendition files per title (every combination of resolution, bitrate, codec, language, and subtitle track).
| Dimension | YouTube | Netflix |
| CDN | Third-party + Google edge PoPs | Proprietary Open Connect appliances |
| Source variety | 500 hrs/min, any codec | ~17K titles, studio masters |
| Bitrate ladder | Fixed per resolution tier | Per-title convex hull optimisation |
| ABR algorithm | BOLA (Lyapunov-based) | Client-side + server-side hybrid |
| Transcoding | Cloud workers (GCP) | On-premise + cloud hybrid |
๐งญ Architecture Decision Guide: Choosing the Right Component
| Situation | Recommendation |
| Use HLS/DASH over RTMP | For VoD playback. RTMP is a live ingest protocol โ it has no concept of multi-rendition manifests or CDN segment caching. HLS/DASH are the only formats with ABR and broad device support. |
| Use async transcoding over sync | Always for VoD. Sync transcoding blocks the upload response for 30+ minutes on a 2-hour video. Async queue + worker pool decouples creator UX from encoding latency. |
| Use CDN edge over proxying from origin | For all segment delivery. Origin servers cannot serve 23 Tbps. CDN edge nodes are the architecture, not an optimisation. |
| Use Redis INCR over DB INSERT for view counts | When write rate > 10K/s. At 50M concurrent viewers firing view events, a direct PostgreSQL write would exceed connection and IOPS limits. Redis handles 1M+ atomic increments/second. |
| Split transcoding workers by rendition | Always. Encoding 4K and 360p in the same job means the 360p rendition (needed in minutes) waits for 4K (needs 30 min). Separate worker pools with separate priority queues solve this. |
| Avoid storing video bytes in the Metadata DB | The DB stores manifest URLs only. Storing segments in a relational DB creates a hard ceiling on throughput and storage cost that object storage does not have. |
๐งช Practical Example: Tracing a Single Video from Camera to Viewer
Walk through a concrete upload โ a creator uploads a 10-minute 1080p cooking tutorial recorded on an iPhone:
Upload phase (T+0 to T+45 seconds):
- iPhone YouTube app calls
POST /upload/initโ server returnsuploadId = abc123 - App splits 1.2 GB MP4 into 120 ร 10 MB chunks; uploads 5 chunks in parallel
- Upload Service writes each chunk to S3 multipart; acknowledges each with HTTP 200
- At T+40 s, all 120 chunks acknowledged; app calls
POST /upload/abc123/complete - Upload Service calls S3
CompleteMultipartUpload; emitsvideo.uploadedto Kafka
Transcoding phase (T+45 s to T+5 min for 720p; T+45 s to T+25 min for 4K):
- Transcoding Orchestrator consumes Kafka event; dispatches 6 encoding jobs to worker pool
360pand480pworkers finish in ~2 min;720pworker finishes in ~4 min- Metadata DB updated:
status = PARTIALLY_READY,manifest_urlwritten with 360p/480p/720p available - Creator's Studio dashboard shows "Processing โ 720p available" at T+5 min
1080pworker completes at T+15 min;4Kat T+25 min; status โREADY
First viewer plays the video (T+6 min):
- Viewer clicks video thumbnail โ
GET /videos/{videoId}โ metadata from Redis (cache hit, 2 ms) - Player fetches
master.m3u8from CDN โ cache miss (video is 1 min old) โ CDN pulls from origin โ caches segment - Player selects
720p(bandwidth estimate 5 Mbps) โ fetches720p/index.m3u8 - Player downloads segments 0, 1, 2 (12 s buffer); renders first frame at T+280 ms from click
- At T+35 s of watch time, view event fires โ
INCR video:views:{videoId}in Redis
One hour later (viral moment โ 50K concurrent viewers):
- CDN hit rate for
720psegments: 99.8% (all cached at nearest PoP) - Trending job detects view velocity โ adds video to
trending:videos:USsorted set - CDN pre-warm job pushes all segments of first 5 minutes to all North American PoPs
- Origin receives ~200 requests/hour total (only for cache misses on long-tail segments)
- Redis
video:views:{videoId}accumulates; background job flushes 50K views to PostgreSQL every 5 min
๐ Lessons Learned from YouTube-Scale Streaming
Never proxy video bytes through application servers. Every Upload Service and Download Service should redirect clients to pre-signed S3/CDN URLs. Application servers die immediately under video-scale bandwidth โ their job is to orchestrate URLs, not move bytes.
Transcoding is a first-class distributed system. It has its own queue, its own worker autoscaler, its own retry logic (a worker crash mid-encode must not lose the job), and its own priority tiers. Treating it as a simple background job is the most common design mistake.
Ship 720p first, 4K later. Viewers start watching within minutes of upload. The transcoding pipeline must produce a "fast path" rendition (480p or 720p) quickly, then continue encoding higher renditions asynchronously. This is a priority queue problem, not just a speed problem.
CDN hit rate is your #1 cost lever. At 23 Tbps egress, a 1% improvement in CDN hit rate saves roughly $2M/year in origin data transfer costs. Cache key design, segment duration tuning, and pre-warming are all worth engineering investment.
View counts don't need to be exact. No viewer notices whether a video has 1,024,819 or 1,025,000 views. Eventual consistency via Redis + periodic flush is the correct trade-off. Strong consistency here wastes database write capacity that is needed for real user-facing operations.
Origin shield is mandatory, not optional. Without it, publishing a video to 200 edge PoPs causes 200 simultaneous S3 GETs on first cache miss. With origin shield, it collapses to one. This is not an optimisation โ it is an architectural requirement above 10M concurrent viewers.
๐ TLDR Summary & Key Takeaways: The Five Decisions That Define Video Streaming Architecture
- Chunked resumable upload solves the reliability problem for large file ingestion across any network quality.
- Async transcoding DAG with worker pools decouples encoding latency from upload responsiveness and scales independently.
- HLS/DASH multi-rendition segmentation is the prerequisite for adaptive bitrate โ without multiple renditions, ABR is impossible.
- CDN with origin shield + pre-warming shifts 99% of egress traffic to edge nodes, making origin servers irrelevant to playback latency.
- Redis counters with periodic DB flush solve the high-write view counting problem without sacrificing scale or consistency for non-critical data.
One-liner to remember: Upload pipeline writes segments; CDN edge nodes read them. Those two paths never cross.
๐ Practice Quiz
A creator uploads a 2-hour raw video. Why does the Upload Service use chunked multipart upload rather than a single HTTP PUT?
- A) Single PUT is not supported by S3
- B) Chunked upload enables parallel transfer, resumable recovery on failure, and avoids timeouts on large files
- C) Chunked upload is required for HLS segmentation
- D) To avoid paying per-byte CDN fees on upload
Correct Answer: B
Your transcoding pipeline takes 45 minutes to produce a 4K rendition, but viewers expect to watch within 2 minutes of upload. What is the correct architectural fix?
- A) Use faster CPU instances to reduce total encode time to 2 minutes
- B) Block the upload response until at least 720p is ready
- C) Implement a priority fast-path: encode 480p or 720p first, then queue higher renditions asynchronously
- D) Use RTMP instead of HLS to avoid transcoding entirely
Correct Answer: C
A viral video gets 5 million views in the first hour. Your origin S3 bucket starts returning 503 errors. Which component should have prevented this?
- A) A larger PostgreSQL read replica for video metadata
- B) CDN origin shield collapsing parallel edge cache misses to a single origin fetch
- C) More Upload Service pods behind the load balancer
- D) Redis sorted set for trending videos
Correct Answer: B
Why do video streaming platforms use HLS or DASH instead of progressively downloading a single MP4 file?
- A) MP4 is not supported by modern browsers
- B) HLS/DASH segment playlists allow adaptive bitrate switching per 4-second segment, which MP4 download cannot do
- C) HLS and DASH use less storage than MP4
- D) MP4 cannot be stored in S3
Correct Answer: B
Open-ended: Your product team wants real-time exact view counts displayed to creators within 1 second of each view. How would you redesign the view counting system, and what trade-offs would you accept? Consider write throughput, consistency, and cost in your answer.
๐ Related Posts
- System Design: Caching and Asynchronism
- System Design: Message Queues and Event-Driven Architecture
- System Design HLD Example: File Storage and Sync
- System Design: Sharding Strategy

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache Hudi
TLDR: Delta Lake, Apache Iceberg, and Apache Hudi are open table formats that wrap Parquet files with a transaction log (or snapshot tree) to deliver ACID guarantees, time travel, schema evolution, and efficient upserts on object storage. Choose Delt...
Medallion Architecture: Bronze, Silver, and Gold Layers in Practice
TLDR: Medallion Architecture solves the "data swamp" problem by organizing a data lake into three progressively refined zones โ Bronze (raw, immutable), Silver (cleaned, conformed), Gold (aggregated, business-ready) โ so teams always build on a trust...
Kappa Architecture: Streaming-First Data Pipelines
TLDR: Kappa architecture replaces Lambda's batch + speed dual codebases with a single streaming pipeline backed by a replayable Kafka log. Reprocessing becomes replaying from offset 0. One codebase, no drift. TLDR: Kappa is the right call when your t...
Big Data 101: The 5 Vs, Ecosystem, and Why Scale Breaks Everything
TLDR: Traditional databases fail at big data scale for three concrete reasons โ storage saturation, compute bottleneck, and write-lock contention. The 5 Vs (Volume, Velocity, Variety, Veracity, Value) frame what makes data "big." A layered ecosystem ...
