LLD for URL Shortener: Designing TinyURL
How do you turn a long URL into a 7-character string? We explore Base62 encoding, Hash collisions, and database choices.
Abstract AlgorithmsTLDR: A URL Shortener maps long URLs to short IDs. The core challenge is generating a globally unique, short, collision-free ID at scale. We use Base62 encoding on auto-incrementing database IDs for deterministic, collision-free short codes.
๐ The Locker Room Analogy
A locker room assigns locker numbers. The attendant doesn't randomly pick numbers โ she uses the next available sequential number. You get "Locker 5" โ you can find it instantly.
URL shorteners work the same way: assign a unique sequential integer ID, then encode it in a compact string format.
- Input:
https://www.google.com/search?q=system+design+interview - Auto-increment database ID:
125 - Base62 encode
125โcb - Short URL:
https://tiny.url/cb
๐ข Why Base62 and Not MD5?
Approach A โ Hashing (MD5/SHA):
- Hash the long URL:
MD5("https://...") = 3b94d2a8... - Take first 7 chars:
3b94d2a. - Problem: Two different URLs might produce the same first 7 chars โ collision โ wrong redirect.
- Collision probability with 7 chars from MD5: ~1 in 1.4 billion โ sounds small, but with billions of URLs it happens.
Approach B โ Base62 on Auto-Increment ID (chosen):
- Database auto-increments: ID = 1, 2, 3, โฆ
- Encode:
toBase62(ID) - Guarantee: Every ID is unique โ every short code is unique. Zero collision probability.
Base62 alphabet: a-z (26) + A-Z (26) + 0-9 (10) = 62 characters.
$$62^7 \approx 3.5 \text{ trillion combinations with 7 characters}$$
More than enough for any URL shortening service.
CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def encode(n: int) -> str:
if n == 0: return CHARS[0]
result = []
while n:
result.append(CHARS[n % 62])
n //= 62
return "".join(reversed(result))
def decode(s: str) -> int:
n = 0
for c in s:
n = n * 62 + CHARS.index(c)
return n
โ๏ธ System Design: API, Database, and Redirect Flow
API Endpoints
POST /api/shorten
Body: { "longUrl": "https://..." }
Response: { "shortUrl": "https://tiny.url/cb", "code": "cb" }
GET /{shortCode}
Response: HTTP 301/302 Redirect to the long URL
301 vs 302:
301 Permanent: Browser caches the redirect. Server never sees the request again. Saves bandwidth but loses analytics clicks.302 Temporary: Browser always asks the server. Preserves click-through analytics. Used by Bitly.
Database Schema
CREATE TABLE url_mapping (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(10) UNIQUE NOT NULL, -- e.g., "cb"
long_url TEXT NOT NULL,
created_by VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP, -- optional TTL
clicks BIGINT DEFAULT 0
);
Redirect Flow
flowchart LR
A["GET /cb"] --> Cache["Redis Cache\nlookup 'cb'"]
Cache -->|hit| Redirect["301/302 to long URL"]
Cache -->|miss| DB["DB: SELECT long_url WHERE short_code='cb'"]
DB --> Cache2["Write to Redis (TTL=24h)"]
Cache2 --> Redirect
Cache the most popular short codes in Redis with a 24-hour TTL. On a miss, query the DB and warm the cache.
๐ง Scaling the ID Generator
Auto-increment works with a single DB shard. At scale:
| Approach | How | Tradeoff |
| DB Auto-Increment | Sequential IDs, single write master | Bottleneck at DB master |
| Token Range Service | Each app server claims a range (1โ1000, 1001โ2000, โฆ) | Low contention; predictable ranges |
| Twitter Snowflake | 41-bit timestamp + 10-bit machine ID + 12-bit sequence | 64-bit globally unique, time-sortable, no coordination needed |
| UUID | 128-bit random, no coordination | Too long for a short URL; not sortable |
Token Range Service is the sweet spot for most URL shorteners:
- App server asks "Range Service" for the next batch of 1000 IDs.
- App server assigns IDs locally from its batch.
- Range Service only coordinates batch handoffs, not individual URLs.
โ๏ธ Custom Aliases and Collision Handling
Users sometimes want custom codes: tiny.url/my-blog.
- Check if
my-blogexists in the DB. - If not: insert {short_code: "my-blog", long_url: "..."}.
- If yes: return "Code already taken" error.
Custom aliases bypass the Base62 auto-increment path and are stored directly.
๐ Summary
- Base62 on auto-increment ID is collision-free and generates compact 7-char codes for up to 3.5 trillion URLs.
- 301 saves bandwidth but loses analytics; 302 preserves click tracking โ use 302 for production analytics.
- Redis cache on the read path eliminates DB hits for popular short codes.
- Token Range Service or Snowflake IDs scale the ID generator beyond a single DB.
๐ Practice Quiz
Why is Base62 on auto-increment ID preferred over taking the first 7 characters of an MD5 hash?
- A) Base62 uses less CPU.
- B) Auto-increment IDs are unique by definition โ collision-free; MD5 7-char prefixes can collide for different URLs.
- C) MD5 produces lowercase only.
Answer: B
Your URL shortener uses HTTP 301 for redirects. A user shares a short link. Later you change the destination URL. What problem occurs?
- A) The short code becomes invalid.
- B) Browsers that cached the 301 will keep redirecting to the old destination, bypassing the server.
- C) The DNS record becomes stale.
Answer: B
In the Token Range Service approach, what does each app server receive?
- A) A cryptographic nonce for ID generation.
- B) A range of integer IDs (e.g., 5001โ6000) it can assign locally without per-URL coordination.
- C) A replicated copy of the full ID namespace.
Answer: B

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
