All Posts

LLD for URL Shortener: Designing TinyURL

How do you turn a long URL into a 7-character string? We explore Base62 encoding, Hash collisions, and database choices.

Abstract AlgorithmsAbstract Algorithms
ยทยท5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: A URL Shortener maps long URLs to short IDs. The core challenge is generating a globally unique, short, collision-free ID at scale. We use Base62 encoding on auto-incrementing database IDs for deterministic, collision-free short codes.


๐Ÿ“– The Locker Room Analogy

A locker room assigns locker numbers. The attendant doesn't randomly pick numbers โ€” she uses the next available sequential number. You get "Locker 5" โ†’ you can find it instantly.

URL shorteners work the same way: assign a unique sequential integer ID, then encode it in a compact string format.

  • Input: https://www.google.com/search?q=system+design+interview
  • Auto-increment database ID: 125
  • Base62 encode 125 โ†’ cb
  • Short URL: https://tiny.url/cb

๐Ÿ”ข Why Base62 and Not MD5?

Approach A โ€” Hashing (MD5/SHA):

  • Hash the long URL: MD5("https://...") = 3b94d2a8...
  • Take first 7 chars: 3b94d2a.
  • Problem: Two different URLs might produce the same first 7 chars โ†’ collision โ†’ wrong redirect.
  • Collision probability with 7 chars from MD5: ~1 in 1.4 billion โ€” sounds small, but with billions of URLs it happens.

Approach B โ€” Base62 on Auto-Increment ID (chosen):

  • Database auto-increments: ID = 1, 2, 3, โ€ฆ
  • Encode: toBase62(ID)
  • Guarantee: Every ID is unique โ†’ every short code is unique. Zero collision probability.

Base62 alphabet: a-z (26) + A-Z (26) + 0-9 (10) = 62 characters.

$$62^7 \approx 3.5 \text{ trillion combinations with 7 characters}$$

More than enough for any URL shortening service.

CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

def encode(n: int) -> str:
    if n == 0: return CHARS[0]
    result = []
    while n:
        result.append(CHARS[n % 62])
        n //= 62
    return "".join(reversed(result))

def decode(s: str) -> int:
    n = 0
    for c in s:
        n = n * 62 + CHARS.index(c)
    return n

โš™๏ธ System Design: API, Database, and Redirect Flow

API Endpoints

POST /api/shorten
Body: { "longUrl": "https://..." }
Response: { "shortUrl": "https://tiny.url/cb", "code": "cb" }

GET /{shortCode}
Response: HTTP 301/302 Redirect to the long URL

301 vs 302:

  • 301 Permanent: Browser caches the redirect. Server never sees the request again. Saves bandwidth but loses analytics clicks.
  • 302 Temporary: Browser always asks the server. Preserves click-through analytics. Used by Bitly.

Database Schema

CREATE TABLE url_mapping (
    id         BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code VARCHAR(10) UNIQUE NOT NULL,   -- e.g., "cb"
    long_url   TEXT NOT NULL,
    created_by VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,                     -- optional TTL
    clicks     BIGINT DEFAULT 0
);

Redirect Flow

flowchart LR
    A["GET /cb"] --> Cache["Redis Cache\nlookup 'cb'"]
    Cache -->|hit| Redirect["301/302 to long URL"]
    Cache -->|miss| DB["DB: SELECT long_url WHERE short_code='cb'"]
    DB --> Cache2["Write to Redis (TTL=24h)"]
    Cache2 --> Redirect

Cache the most popular short codes in Redis with a 24-hour TTL. On a miss, query the DB and warm the cache.


๐Ÿง  Scaling the ID Generator

Auto-increment works with a single DB shard. At scale:

ApproachHowTradeoff
DB Auto-IncrementSequential IDs, single write masterBottleneck at DB master
Token Range ServiceEach app server claims a range (1โ€“1000, 1001โ€“2000, โ€ฆ)Low contention; predictable ranges
Twitter Snowflake41-bit timestamp + 10-bit machine ID + 12-bit sequence64-bit globally unique, time-sortable, no coordination needed
UUID128-bit random, no coordinationToo long for a short URL; not sortable

Token Range Service is the sweet spot for most URL shorteners:

  1. App server asks "Range Service" for the next batch of 1000 IDs.
  2. App server assigns IDs locally from its batch.
  3. Range Service only coordinates batch handoffs, not individual URLs.

โš–๏ธ Custom Aliases and Collision Handling

Users sometimes want custom codes: tiny.url/my-blog.

  1. Check if my-blog exists in the DB.
  2. If not: insert {short_code: "my-blog", long_url: "..."}.
  3. If yes: return "Code already taken" error.

Custom aliases bypass the Base62 auto-increment path and are stored directly.


๐Ÿ“Œ Summary

  • Base62 on auto-increment ID is collision-free and generates compact 7-char codes for up to 3.5 trillion URLs.
  • 301 saves bandwidth but loses analytics; 302 preserves click tracking โ€” use 302 for production analytics.
  • Redis cache on the read path eliminates DB hits for popular short codes.
  • Token Range Service or Snowflake IDs scale the ID generator beyond a single DB.

๐Ÿ“ Practice Quiz

  1. Why is Base62 on auto-increment ID preferred over taking the first 7 characters of an MD5 hash?

    • A) Base62 uses less CPU.
    • B) Auto-increment IDs are unique by definition โ†’ collision-free; MD5 7-char prefixes can collide for different URLs.
    • C) MD5 produces lowercase only.
      Answer: B
  2. Your URL shortener uses HTTP 301 for redirects. A user shares a short link. Later you change the destination URL. What problem occurs?

    • A) The short code becomes invalid.
    • B) Browsers that cached the 301 will keep redirecting to the old destination, bypassing the server.
    • C) The DNS record becomes stale.
      Answer: B
  3. In the Token Range Service approach, what does each app server receive?

    • A) A cryptographic nonce for ID generation.
    • B) A range of integer IDs (e.g., 5001โ€“6000) it can assign locally without per-URL coordination.
    • C) A replicated copy of the full ID namespace.
      Answer: B

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms