All Posts

System Design HLD Example: Proximity Service (Yelp)

A practical interview-ready HLD for finding nearby restaurants, hotels, and points of interest.

Abstract AlgorithmsAbstract Algorithms
Β·Β·26 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: A proximity service (Yelp/Google Places) encodes every business location into a geohash string, indexes it in a B-tree column, and queries the center cell plus its 8 neighbors to guarantee no boundary misses. Redis GEORADIUS caches hot-cell results for <2ms reads. Geographic sharding by geohash prefix keeps write load balanced globally. The hard problems are not the algorithm β€” they are the cell boundary edge case, the Manhattan hot-cell overload, and cache invalidation on address changes.

You open Yelp and search "coffee near me." Within 100ms, Yelp returns the 10 highest-rated coffee shops within 1 mile of your GPS coordinates, ranked by rating. Behind that response is a spatial index serving 100 million queries per day across 200 million indexed businesses. The naΓ―ve approach β€” a SQL range scan on latitude and longitude columns β€” fails at this scale: at 200 million businesses a bounding-box query returns tens of thousands of candidates before a single distance calculation, and latency climbs to seconds.

Understanding how to design this system teaches you geospatial indexing, location-aware caching, and geographic sharding β€” three patterns that appear verbatim in Uber driver matching, Google Maps nearby search, DoorDash delivery zone lookup, and every food delivery app you have used. By the end of this walkthrough you will know why geohash beats raw lat/lon queries, when to use a quadtree instead, and how Redis GEORADIUS eliminates database load for the hottest search cells.

πŸ“– Why a SQL Range Query Can't Find Your Coffee Shop

Actors and Use Cases

ActorRole
End userSearches "restaurants near me," views business details, submits reviews
Business ownerCreates and updates a business profile (name, address, hours, photos)
Platform serviceIndexes location data; enforces rate limits; aggregates ratings

Core user journeys in scope:

  • Nearby search β€” User sends current lat/lon + radius + category filter; system returns the top-N ranked businesses sorted by rating
  • Business detail β€” User selects a result; system returns the full profile including rating, hours, and photos served via CDN
  • Add/update business β€” Owner submits a business; system validates, geocodes the address, persists the record, and updates the spatial index
  • Write review β€” User submits a 1–5 star rating and text; system updates the running average rating on the business record
  • View photos β€” User browses business photos delivered from a CDN edge node

Out of Scope (v1 boundary)

Excluded featureWhy it is out of scope
Personalized ML ranking (collaborative filtering)Depends on proximity as a prerequisite; a separate ranking service layer
Fraud detection for fake reviewsSeparate async pipeline; does not touch the spatial index
Real-time traffic or live opening hoursThird-party data enrichment; not part of the location index design
Global active-active multi-region writesPhase 2 evolution; single-region design is sufficient for the interview

The Core Problem: Two-Dimensional Distance Is Not B-Tree Friendly

A database B-tree index is a one-dimensional sorted structure. A latitude index efficiently finds all rows where lat BETWEEN 37.78 AND 37.80, and a longitude index finds rows where lon BETWEEN -122.42 AND -122.40 β€” but no single B-tree covers both simultaneously. The query planner picks one index and scans millions of rows in memory to apply the second filter. At 200 million businesses a 1-mile bounding-box query returns roughly 50,000 candidates before any distance calculation, and end-to-end latency climbs to seconds.

The solution is a spatial index β€” a data structure that maps two-dimensional coordinates into a one-dimensional key while preserving spatial locality. Geohashing is the most widely adopted approach at database scale because it maps cleanly to SQL B-tree indexes and enables geographic sharding without additional infrastructure.

πŸ” Geohashing: How a Location Becomes a Sortable String

Geohash is a geocoding system that recursively divides the Earth into a grid of rectangular cells and encodes each cell as a base32 string. A longer string means a smaller cell and higher coordinate precision.

How encoding works in four steps:

  1. Divide the latitude range [-90, 90] in half. Encode 0 if the target latitude is in the lower half, 1 if in the upper half.
  2. Divide the longitude range [-180, 180] the same way.
  3. Interleave the latitude and longitude bits (they alternate position).
  4. Convert every 5 bits to a single base32 character using the alphabet 0123456789bcdefghjkmnpqrstuvwxyz.

Geohash precision reference table:

Length (chars)Cell widthCell heightPractical use
1Β±2,500 kmΒ±2,500 kmContinent-level grouping
4Β±39 kmΒ±20 kmCity-level grouping
6Β±1.2 kmΒ±0.6 kmNeighborhood-level β€” ideal for a 1-mile radius search
8Β±38 mΒ±19 mStreet-level precision
10Β±1.2 mΒ±0.6 mAddress-level precision

For a 1-mile radius search, precision 6 is the right choice: each cell is roughly 1.2 km Γ— 0.6 km. Union Square, San Francisco at (37.7874Β°N, 122.4080Β°W) encodes to geohash 9q8yyk. Blue Bottle Coffee two blocks away encodes to 9q8yyh β€” a different cell with the shared prefix 9q8yy, reflecting their physical proximity.

The critical insight: Two nearby locations share a geohash prefix. A SQL query WHERE geohash6 LIKE '9q8yy%' retrieves all businesses in that neighborhood in a single index scan β€” turning the two-dimensional problem back into a one-dimensional prefix lookup.

The boundary problem and its fix: A business sitting precisely on a cell boundary has a geohash that only matches one side of the boundary. Querying only the center cell will silently miss it even if it is within the search radius. The fix is always to query the center cell plus its 8 surrounding neighbors β€” 9 cells total β€” which guarantees complete coverage for any radius smaller than a single cell width.

Geohash neighborhood grid (center = user location "9q8yyk"):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 9q8yy4   β”‚ 9q8yy5   β”‚ 9q8yy7   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 9q8yyh   β”‚ 9q8yyk   β”‚ 9q8yym   β”‚  ← center
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 9q8yyu   β”‚ 9q8yyv   β”‚ 9q8yyw   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Querying all 9 cells guarantees no boundary misses within 1 mile.

Standard geohash libraries expose a neighbors() function that returns the 8 adjacent cells in O(1) time (Python: pygeohash.neighbors, Java: com.github.davidmoten:geo).

βš™οΈ The Three-Step Search: Encode, Expand, Rank

Every proximity search β€” regardless of whether it runs against PostgreSQL, Elasticsearch, or Redis β€” resolves in three deterministic steps:

Step 1 β€” Encode. Convert the user's GPS coordinates to a geohash at the appropriate precision. Precision 6 for a 1-mile search; precision 5 (cell β‰ˆ 5 km Γ— 5 km) for a 5-mile search; precision 4 for a 25-mile city-wide search.

Step 2 β€” Expand. Compute the 8 neighboring geohash cells using the neighbors() library call. Now you have 9 cells covering the full search area with guaranteed boundary coverage.

Step 3 β€” Query and rank. Execute a prefix scan against the business index for all 9 cells. Pull back the candidate set (typically a few hundred businesses per cell in a dense city), apply the Haversine formula for exact distance filtering, and sort by rating to produce the final ranked list.

-- PostgreSQL: 9-cell geohash IN query (runs against a B-tree index on geohash6)
SELECT
    id, name, lat, lon, rating, review_count,
    ( 6371 * acos(
          cos(radians(:userLat)) * cos(radians(lat))
        * cos(radians(lon) - radians(:userLon))
        + sin(radians(:userLat)) * sin(radians(lat))
    )) AS dist_km
FROM businesses
WHERE geohash6 IN (
    '9q8yyk', '9q8yy4', '9q8yy5', '9q8yy7',
    '9q8yyh', '9q8yym', '9q8yyu', '9q8yyv', '9q8yyw'
)
  AND category = :category
  AND is_active = TRUE
HAVING dist_km <= 1.6    -- 1 mile β‰ˆ 1.6 km
ORDER BY rating DESC
LIMIT 10;

The geohash6 column carries a standard B-tree index. An IN (9 values) clause performs 9 index seeks, each touching only the matching rows for that cell β€” typically a few hundred rows in a dense urban cell, a handful in rural cells. The Haversine calculation happens in the application layer after the candidate set is fetched, not in the database. This is deliberate: SQL Haversine without a spatial index forces a full table compute; geohash restricts the candidate set to a manageable size first.

Expanding the search radius: If the initial 9-cell search returns fewer than 10 results (rural area), expand to a wider geohash prefix (precision 5) and repeat. This "zoom-out" loop runs at most 2–3 times in practice.

🧠 Deep Dive: Capacity, Design Goals, and the Data Structures Behind the Index

Capacity Estimation

DimensionAssumptionDerivation
DAU100M usersIndustry benchmark for a Yelp-scale service
Businesses indexed200MGlobal active businesses including restaurants, hotels, shops
Search QPS (steady)~11,500 reads/sec100M DAU Γ— 10 searches/day Γ· 86,400 s
Search QPS (peak)~50,000 reads/sec5Γ— steady-state burst (lunch, Friday evening)
Write QPS~100 writes/secNew businesses + edits; 100:1 read/write ratio
Business record size~1 KBName, address, lat/lon, geohash6, category, rating, review count
Total business storage~200 GB200M Γ— 1 KB
Review storage~50 TBEstimated 100B cumulative reviews Γ— ~500 B each
Hot search cache~10 GB RedisTop 5% of cells handle 90% of queries (Zipf distribution)
Photos storage~500 TBServed from CDN; out of scope for the index design

Key insight: This is an overwhelmingly read-heavy workload. The search:write ratio is 100:1, which means the right strategy is to optimize the read path relentlessly β€” caching, read replicas, and a cache-friendly index structure β€” rather than optimizing write throughput.

Design Goals

GoalMeasurable targetArchitecture decision it drives
Search latencyp95 < 100ms end-to-endGeohash B-tree index + Redis cache-first search path
Cache hit rateβ‰₯ 90% for major metro cellsGEORADIUS for hot cells; PostgreSQL replica for cold cells
Spatial correctnessZero boundary missesAlways query 9 cells (center + 8 neighbors)
Write latencyp95 < 500ms for new business creationAsync spatial index update decoupled from the primary DB write
Availability99.9% on the search pathRead replicas; Redis Cluster failover; graceful degradation to DB
Geographic scaleGlobal; all 200M businesses indexedShard by LEFT(geohash6, 3) β€” 32Β³ possible shard keys

The Internals: Geohash vs. Quadtree Data Structures

Geohash (B-tree on a VARCHAR(6) column):

Every business row stores a geohash6 CHAR(6) column. A standard B-tree index on this column makes WHERE geohash6 IN (9 values) an efficient 9-seek operation with O(log N) per seek. This approach slots into any SQL database without extensions and supports geographic sharding by taking the first 3 characters as a shard key β€” producing 32Β³ = 32,768 possible shard ranges that naturally distribute global geography without hotspots.

Quadtree (in-memory spatial partition tree):

A quadtree recursively partitions a bounding box into 4 equal quadrants. When a quadrant's business count exceeds a threshold (e.g., 100), it splits further. A proximity search traverses the tree to find all quadrants overlapping the search radius. The key advantage is density-adaptive partitioning: a Manhattan quadrant is a much smaller geographic area than a Montana quadrant because they both hold at most 100 businesses. The critical disadvantage is that quadtrees live in memory β€” they are hard to shard across machines and expensive to rebuild from persistent storage on restart.

Rule of thumb: Geohash for persistent business data backed by a SQL database; quadtree for ephemeral, high-frequency in-memory data (Uber driver positions update every 4 seconds and the full active-driver dataset fits in one machine's RAM).

Index typeStorageUpdate costShard-friendlyBest for
Geohash + B-treeSQL columnO(1) UPDATEβœ… Yes β€” prefix shardingBusiness directories (static)
QuadtreeIn-memory treeO(log N) re-insert❌ HardDriver / courier positions (dynamic)
PostGIS R-treeSQL extensionO(log N)βšͺ With effortComplex polygon intersections
Redis GEORADIUSIn-memory sorted setO(log N) GEOADDβšͺ Per-node limitHot-path cache (<50M points/node)

Performance Analysis: Where the Search Gets Slow

Pressure pointSymptomRoot causeMitigation
Dense cell overload (the Manhattan problem)9-cell query returns 100K rowsPrecision-6 cell is too coarse for dense urban areasIncrease precision to 7 (cell β‰ˆ 150m Γ— 150m) in high-density zones
Boundary missKnown businesses absent from results near cell edgesSearching only the center cellAlways query 9 cells β€” non-negotiable correctness requirement
Cache miss stormLatency spikes during concerts or sporting eventsThousands of cold searches hit the same DB cells simultaneouslyPre-warm Redis for upcoming events; increase TTL for major metro cells
Write amplification during bulk importIndex update latency spikesEvery INSERT updates DB + spatial index synchronouslyBatch spatial index updates via async queue; synchronous only for address changes
Stale cached results after business closesClosed businesses appear in searchCache TTL (10 min) has not expired after business deactivationPublish a cache invalidation event on every status change; invalidate search:results:<geohash6>:*

The largest performance win at this scale is not the choice of spatial algorithm β€” it is the Redis cache layer in front of the database. A single metropolitan geohash cell queried 10,000 times per second can be served from a single Redis key rather than triggering 10,000 separate PostgreSQL index scans.

πŸ“Š High-Level Architecture: From Mobile Client to Ranked Results

flowchart TD
    A[Mobile / Web Client] -->|"GET /search?lat&lon&radius&category"| B[API Gateway]
    B --> C[Location Search Service]
    C -->|"GEORADIUS geo:businesses 1mi ASC"| D[Redis Geo Cache]
    D -->|"cache hit ~90%"| E[Ranked Results Response]
    D -->|cache miss| F[Business DB Read Replica]
    F -->|"geohash6 IN 9 cells + category filter"| G[Businesses Table + geohash6 B-tree index]
    G --> C
    C -->|"haversine filter + rating sort + SET cache EX 600"| E
    E -->|JSON response| A

    H[Business Owner] -->|"POST /businesses"| I[Business Service]
    I -->|"validate + geocode to lat/lon"| J[Business DB Primary]
    I -->|"GEOADD geo:businesses lon lat biz_id"| D
    J -->|async domain event| K[Index Updater Service]
    K -->|"UPDATE geohash6 + invalidate cache cell"| J

    L[Reviewer] -->|"POST /reviews/:id"| M[Review Service]
    M -->|"UPDATE businesses SET rating=avg, review_count=count+1"| J
    M -->|"DEL search:results::*"| D

This diagram shows the read path (top) and write path (bottom) of the proximity service. On the read path, the Location Search Service checks Redis first with GEORADIUS: roughly 90% of searches for popular metropolitan cells return from Redis in under 2ms without touching PostgreSQL. On a cache miss, the service falls back to the Business DB read replica, executes the 9-cell geohash IN query, applies the Haversine filter in the application layer, sorts by rating, caches the result in Redis for 10 minutes, and returns the ranked list. On the write path, new businesses and address updates flow to the PostgreSQL primary through the Business Service, which also issues a GEOADD to keep Redis warm. Review submissions update the aggregate rating on the business record and invalidate the affected cache cells β€” so rating changes appear in search results within one cache TTL window.

Data Model

-- Businesses table (sharded by LEFT(geohash6, 3) across read replicas)
CREATE TABLE businesses (
    id           UUID             PRIMARY KEY DEFAULT gen_random_uuid(),
    name         VARCHAR(255)     NOT NULL,
    address      TEXT             NOT NULL,
    lat          DOUBLE PRECISION NOT NULL,
    lon          DOUBLE PRECISION NOT NULL,
    geohash6     CHAR(6)          NOT NULL,   -- precision 6 β‰ˆ 1.2km Γ— 0.6km cell
    category     VARCHAR(64)      NOT NULL,
    rating       NUMERIC(3,2)     NOT NULL DEFAULT 0.00,
    review_count INT              NOT NULL DEFAULT 0,
    is_active    BOOLEAN          NOT NULL DEFAULT TRUE,
    created_at   TIMESTAMPTZ      NOT NULL DEFAULT NOW(),
    updated_at   TIMESTAMPTZ      NOT NULL DEFAULT NOW()
);

-- Primary spatial index β€” supports the 9-cell IN query
CREATE INDEX idx_businesses_geohash6          ON businesses (geohash6);
-- Composite index β€” avoids a filter step when category is always provided
CREATE INDEX idx_businesses_geohash6_category ON businesses (geohash6, category);

-- Reviews table
CREATE TABLE reviews (
    id          UUID      PRIMARY KEY DEFAULT gen_random_uuid(),
    business_id UUID      NOT NULL REFERENCES businesses(id),
    user_id     UUID      NOT NULL,
    rating      SMALLINT  NOT NULL CHECK (rating BETWEEN 1 AND 5),
    body        TEXT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_reviews_business ON reviews (business_id, created_at DESC);

Redis key layout:

Key patternData structureValueTTL
geo:businessesSorted Set (GEOADD)Member = business_id; score = geo-encoded lon/latNo TTL β€” persistent warm index
search:results:<geohash6>:<category>String (JSON)Ranked array of business summaries10 minutes
business:detail:<id>HashFull business profile fields30 minutes

Write Path: Adding a New Business

Business Owner β†’ POST /businesses
  β†’ Business Service
      β†’ Validate required fields; geocode address β†’ (lat, lon)
      β†’ Compute geohash6 = geohash.encode(lat, lon, precision=6)
      β†’ INSERT INTO businesses (id, name, lat, lon, geohash6, category, ...)  [DB Primary]
      β†’ GEOADD geo:businesses lon lat biz_id                                   [Redis]
      β†’ DEL search:results:<geohash6>:*                                        [Redis invalidation]
  ← 201 Created { businessId, name }

Read Path: Searching Nearby Restaurants

User β†’ GET /search?lat=37.787&lon=-122.408&radius=1mi&category=coffee
  β†’ Location Search Service
      β†’ userGeohash = geohash.encode(37.787, -122.408, 6)  β†’  "9q8yyk"
      β†’ Check: GET search:results:9q8yyk:coffee
            Cache HIT  β†’ deserialize + return top-10 in ~2ms total
            Cache MISS β†’
                β†’ cells = [userGeohash] + neighbors(userGeohash)  (9 cells)
                β†’ SELECT businesses WHERE geohash6 IN (cells) AND category='coffee'
                β†’ Filter by haversine distance ≀ 1.6 km in application layer
                β†’ Sort by rating DESC β†’ take top 10
                β†’ SET search:results:9q8yyk:coffee <json> EX 600
  ← 200 OK [{ id, name, dist_km, rating, review_count }, ...]

🌍 Real-World Application: How Yelp, Uber, and Google Maps Solve Proximity

Real-world systems make different spatial index choices based on their specific update patterns and query shapes:

CompanyIndex approachWhy this choice
YelpGeohash column + Elasticsearch geo_pointBusiness data is mostly static; a B-tree on geohash6 handles bulk search; Elasticsearch layers full-text ranking on top
Uber (driver matching)Custom in-memory quadtreeDriver GPS updates every 4 seconds β€” writing every update to a DB is too expensive; the full active-driver dataset fits in one machine's RAM
Google Maps / PlacesS2 Geometry (Hilbert curve cells)S2 cells are equal-area spherical polygons with no boundary distortion at poles; Google needs global-scale analytics and precise spherical distance calculations
DoorDash / InstacartPostGIS with R-tree indexDelivery zones are irregular polygons (not circles); PostGIS handles native polygon intersection without manual cell math
AirbnbElasticsearch geo_distance filterAccommodation search combines text ranking with geo filtering; Elasticsearch natively combines both in one query

The common thread: static data at scale β†’ geohash in SQL. Dynamic, high-frequency data β†’ in-memory spatial structure. Complex polygon geometry β†’ PostGIS or S2. Choose the tool that matches the actual query shape and update rate of your data, not the tool with the most impressive name.

βš–οΈ Trade-offs & Failure Modes: Boundary Crossings, Hot Cells, and Stale Indexes

Failure Mode 1: Cell Boundary Miss (The Most Common Bug)

A business located 10 meters across a geohash cell boundary will not appear in a single-cell WHERE geohash6 = 'X' query, even if it is well within the user's search radius. In unit tests this is invisible β€” test data rarely falls exactly on a boundary. In production, real users consistently miss businesses near their location.

Fix: Query 9 cells unconditionally. There is no configuration toggle for this β€” single-cell proximity queries are incorrect by definition.

Failure Mode 2: The Manhattan Hot-Cell Problem

At precision 6, a single geohash cell in Manhattan or central Tokyo contains thousands of active businesses. A 9-cell query returning 50,000 candidates before the Haversine filter is a latency bomb: the candidate fetch alone can take 200ms, violating the 100ms SLO.

Two mitigations applied together:

  1. Increase geohash precision dynamically for dense areas. Use precision 7 (cell β‰ˆ 150m Γ— 150m) for cells above a business-count threshold (e.g., >500 businesses per cell). The 9-cell query still runs, but each cell is 4Γ— smaller.
  2. Cache cell results aggressively. In a dense city, the same cell+category combination is queried thousands of times per minute. A 10-minute Redis TTL means the expensive DB scan runs once every 600 seconds, not 600,000 times.

Failure Mode 3: Stale Spatial Index After Business Closes

A business that permanently closes retains its old geohash in the index until invalidated. Users see a closed business as an active search result.

Fix: The Business Service sends a cache invalidation event on every is_active transition, which deletes search:results:<geohash6>:* from Redis. The next query fetches from DB, which now has is_active = FALSE on the row, and the business is excluded from results. Cache freshness within 10 minutes (one TTL window) is the accepted trade-off.

Consistency and Trade-off Summary

DimensionDesign choiceRationale
Search result freshnessEventual (10-minute cache TTL)New businesses appear within one TTL window; near-real-time not required
Rating consistencyStrong on write (synchronous UPDATE businesses SET rating=…)Rating is user-visible; stale ratings directly hurt trust
Read vs. write scalingScale read replicas independently of the primary100:1 read/write ratio; replica fan-out absorbs search load with no write path changes
Redis vs. PostgreSQLRedis for hot path; PostgreSQL as source of truthRedis is ephemeral; PostgreSQL is durable β€” never treat Redis as the only copy

🧭 Decision Guide: Geohash, Quadtree, or PostGIS?

SituationRecommendation
Static business directory with >10M recordsGeohash column + B-tree index; shard DB by geohash prefix
Frequently moving objects (drivers, couriers, scooters)In-memory quadtree or S2; avoid writing every GPS ping to a relational DB
Delivery zones, service areas, or polygon-shaped regionsPostGIS with R-tree; native polygon intersection without manual cell math
Redis as primary hot-path indexGEOADD/GEORADIUS β€” ideal for ≀50M points per Redis node
Starting fresh with <1M businessesPostGIS β€” simpler operational story, rich feature set, no custom encoding logic
Dense urban cells causing latency overloadIncrease geohash precision from 6 β†’ 7 for affected cells; combine with Redis cell caching
Need to combine proximity with full-text searchElasticsearch geo_point + full-text query in a single request
Avoid whenNever use raw lat/lon range scans without a spatial index at >1M rows

πŸ§ͺ Practical Example: Searching "Coffee Near Union Square, SF"

A user at Union Square, San Francisco (37.7874Β°N, 122.4080Β°W) opens Yelp and taps "Coffee, 1 mile." Here is the full resolution trace:

Step 1 β€” Encode user location:

import pygeohash as geohash

user_geohash = geohash.encode(37.7874, -122.4080, precision=6)
# user_geohash = "9q8yyk"

Step 2 β€” Expand to 9 cells:

neighbors = geohash.neighbors("9q8yyk")
all_cells = ["9q8yyk"] + list(neighbors.values())
# all_cells = ["9q8yyk", "9q8yy4", "9q8yy5", "9q8yy7",
#              "9q8yyh", "9q8yym", "9q8yyu", "9q8yyv", "9q8yyw"]
# These 9 cells cover roughly 4 kmΒ² around Union Square

Step 3a β€” Redis fast path (cache hit, ~90% of requests):

GEORADIUS geo:businesses -122.4080 37.7874 1 mi
          ASC COUNT 50 WITHCOORD WITHDIST
-- Returns 48 business IDs sorted by distance
-- Application re-ranks top 50 by rating β†’ returns top 10
-- Total latency: ~1.8ms

Step 3b β€” Database fallback (cache miss, ~10% of requests):

SELECT id, name, lat, lon, rating,
       (6371 * acos(cos(radians(37.7874)) * cos(radians(lat))
             * cos(radians(lon) - radians(-122.4080))
             + sin(radians(37.7874)) * sin(radians(lat)))) AS dist_km
FROM   businesses
WHERE  geohash6 IN ('9q8yyk','9q8yy4','9q8yy5','9q8yy7',
                    '9q8yyh','9q8yym','9q8yyu','9q8yyv','9q8yyw')
  AND  category = 'coffee'
  AND  is_active = TRUE
HAVING dist_km <= 1.6
ORDER  BY rating DESC
LIMIT  10;
-- 9 B-tree index scans β†’ ~280 candidate rows β†’ ~80 pass Haversine filter
-- Total latency: ~25ms

Step 4 β€” Cache the result for subsequent requests:

SET search:results:9q8yyk:coffee
    '[{"id":"biz_001","name":"Blue Bottle Coffee","dist_km":0.12,"rating":4.6}, ...]'
    EX 600
-- Next 10 minutes: all identical queries return from Redis in ~2ms

Latency summary:

  • Cache hit path: ~2ms Redis + ~5ms serialization/network = ~7ms total βœ…
  • Cache miss path: ~25ms SQL + ~5ms network + ~2ms Redis write = ~32ms total βœ…

Both paths comfortably meet the 100ms design goal.

πŸ› οΈ Redis Geo Commands: How They Power Proximity Search at Yelp Scale

Redis ships a native geospatial data structure built on geohash encoding, stored internally as a sorted set with geo-encoded scores. For the hot path of a proximity service, Redis handles spatial queries without any additional infrastructure.

Indexing a business (on create or address update):

GEOADD geo:businesses -122.4080 37.7874 "biz_001"
-- Format: GEOADD <key> <longitude> <latitude> <member>
-- Redis encodes (lon, lat) as a 52-bit geohash and stores it as the sorted set score
-- Multiple businesses can be added in one call:
GEOADD geo:businesses
    -122.4080 37.7874 "biz_001"
    -122.4123 37.7901 "biz_002"
    -122.4055 37.7856 "biz_003"

Searching within 1 mile of a user:

GEORADIUS geo:businesses -122.4080 37.7874 1 mi
          ASC COUNT 50 WITHCOORD WITHDIST
-- Returns list of [member, distance, [lon, lat]] tuples:
-- [
--   ["biz_001", "0.12", [-122.40800, 37.78739]],
--   ["biz_002", "0.31", [-122.41230, 37.79010]],
--   ...
-- ]
-- ASC = sorted nearest-first; COUNT 50 = candidate cap; application re-ranks by rating

Looking up the stored position of a specific business:

GEOPOS geo:businesses "biz_001"
-- Returns: [[-122.40800142288208, 37.78739983518099]]
-- Slight precision loss (~0.6 mm) from the 52-bit geohash encoding is acceptable

Expanding the search radius when results are sparse (rural areas):

-- First try: 1 mile
GEORADIUS geo:businesses :lon :lat 1 mi ASC COUNT 10
-- If fewer than 10 results, expand to 5 miles:
GEORADIUS geo:businesses :lon :lat 5 mi ASC COUNT 10

When to use Redis Geo vs. the geohash column in PostgreSQL:

DimensionRedis GEORADIUSPostgreSQL geohash column
Latency~1–2ms (fully in-memory)~10–30ms (disk-backed)
Dataset size limit~50M points per Redis nodeUnlimited (sharded across nodes)
DurabilityAOF/RDB as secondary backupPrimary source of truth
Complex filtersLimited (distance + count only)Full SQL (JOIN, category, is_active, rating)
Role in this systemHot-path cache layerSource of truth + cold-miss search path

The production pattern: PostgreSQL is the source of truth for all 200M businesses; Redis geo:businesses holds the warm geo index for the hottest 5–10M businesses in major metros. A Redis miss falls back to PostgreSQL. GEOADD is called synchronously on every business create and address change to keep the Redis index current.

For a full deep-dive on Redis data structures, persistence modes, and cluster configuration, see System Design: Caching and Asynchronism.

πŸ“š Lessons Learned from Building Location-Aware Systems

  • Always query 9 geohash cells, never just the center. A single-cell query produces systematic missing-data bugs that are invisible in unit tests (test data rarely lands exactly on a boundary) but reproduced continuously by real users near cell edges. Make the 9-cell expansion unconditional.

  • Cache at the cell level, not the business level. Business-level caches require one invalidation per business per change. Cell-level caches require one key deletion per affected cell β€” dramatically simpler invalidation logic with a much higher hit rate.

  • Choose geohash precision based on search radius, not intuition. Precision 6 for 1-mile, precision 5 for 5-mile, precision 4 for 25-mile. Too coarse causes hot-cell overload; too fine requires querying too many cells to cover the radius.

  • Geohash is approximate; Haversine is exact. The 9-cell query intentionally over-fetches candidates. Always apply the Haversine formula in the application layer to filter to the true radius before ranking. Skipping this step returns businesses at the corners of the bounding grid that are outside the actual radius.

  • The hard part is not the spatial algorithm β€” it is the invalidation strategy. Encoding a geohash is trivial (one library call). Knowing when to expire a cached cell after a business closes, moves, or changes category requires deliberate event-driven architecture.

  • Separate the hot write path from the cold index rebuild. Address changes and new business creation must update the spatial index synchronously (otherwise users immediately see missing results). Bulk re-indexing jobs, metadata enrichment, and rating aggregation can be async β€” decoupling them prevents import jobs from degrading search latency.

πŸ“Œ TLDR β€” Summary & Key Takeaways: Proximity Service in Five Decisions

  • Geohash encodes (lat, lon) into a sortable string that fits natively in a SQL B-tree index. A prefix query across 9 cells is the entire search operation β€” no spatial extension required.
  • Always query 9 cells (center + 8 neighbors) β€” single-cell queries produce correctness bugs at cell boundaries. This is non-negotiable.
  • Redis GEORADIUS serves the hot path with ~2ms latency for major metro cells; PostgreSQL is the cold fallback and the sole source of truth. Never treat Redis as the only copy of the location data.
  • Shard by geohash prefix (LEFT(geohash6, 3)) to distribute geographic write and read load evenly β€” 32K possible shard keys maps naturally to global geography without manual region boundaries.
  • The Manhattan hot-cell problem (one cell, millions of businesses) is solved by increasing geohash precision from 6 to 7 in density-detected zones and caching cell-level search results aggressively in Redis.

πŸ“ Practice Quiz

  1. Why does a standard B-tree index on separate lat and lon columns fail to support efficient proximity queries at 200M businesses?

    • A) B-trees do not support floating-point columns
    • B) A B-tree is a one-dimensional structure and cannot simultaneously index two spatial dimensions; the planner must apply one filter in memory
    • C) Databases do not allow composite indexes across geographic coordinate columns
    • D) Lat/lon columns require special geospatial encoding before they can be indexed

    Correct Answer: B

  2. You search for "coffee within 1 mile" using only the user's center geohash cell (precision 6). A highly-rated coffee shop 40 meters from the user is not returned. What is the most likely explanation?

    • A) The geohash precision is too fine, causing the coffee shop's cell to be excluded
    • B) The coffee shop's rating is below the minimum threshold
    • C) The coffee shop sits in an adjacent cell, and the search did not include the 8 neighboring cells
    • D) Redis GEORADIUS does not support sub-mile radius searches

    Correct Answer: C

  3. Yelp receives 40,000 search queries per second for "restaurants in downtown Manhattan." Each query hits PostgreSQL in 30ms. Which single change delivers the greatest latency reduction?

    • A) Add a second B-tree index on the rating column alongside the geohash index
    • B) Cache search results keyed by geohash6 + category in Redis with a 10-minute TTL
    • C) Increase the PostgreSQL connection pool size to 2,000 connections
    • D) Migrate all business data from PostgreSQL to a document database

    Correct Answer: B

  4. A food-delivery startup must index both restaurant locations (mostly static, 5M records) and active delivery couriers (GPS position updated every 3 seconds, 50K active at peak). Which combination of spatial indexes is most appropriate?

    • A) Geohash B-tree column for both β€” consistent tooling simplifies operations
    • B) PostGIS R-tree for restaurants; in-memory quadtree for couriers
    • C) Geohash B-tree column for restaurants; in-memory quadtree or Redis GEOADD for couriers
    • D) Redis GEOADD for both β€” low latency is the priority for all location lookups

    Correct Answer: C

  5. Open-ended challenge: A sold-out stadium concert begins and 60,000 attendees simultaneously search for "food near [stadium address]" β€” all landing on the same geohash cell. Describe two distinct architectural mechanisms that prevent this event-driven spike from overwhelming the database, identify which one you would deploy first, and explain the reasoning behind that prioritization.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms