System Design HLD Example: Proximity Service (Yelp/Google Places)

A practical interview-ready HLD for finding nearby restaurants, hotels, and points of interest.

System Design Interview Prep

Abstract Algorithms

·Mar 28, 2026·16 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 16 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: A proximity service (Yelp/Google Places) solves the 2D search problem by encoding locations into Geohash strings, which are indexed in a standard B-tree. To guarantee results near grid boundaries, the system queries the center cell plus its 8 neighbors and then filters the results using the Haversine formula in the application layer. This architecture provides sub-100ms latency for 200M+ businesses while maintaining the simplicity of a relational database.

📍 The Manhattan Hot-Cell Problem

Imagine you’re in downtown Manhattan. Within a 500-meter radius of your location, there are over 2,000 restaurants, shops, and landmarks. You search for "Coffee," and you expect the results in milliseconds.

If your system uses a standard SQL query like SELECT * FROM businesses WHERE lat BETWEEN x1 AND x2 AND lon BETWEEN y1 AND y2, you're in trouble. Even with indexes on lat and lon separately, the database must perform an "index merge" or scan thousands of rows in memory to find the intersection. At 200 million total businesses, this bounding-box query is slow, and adding a ORDER BY distance filter makes it a CPU-killer.

The core problem is that two-dimensional distance is not B-tree friendly. A B-tree is a one-dimensional sorted structure. To find "nearby" things efficiently, we need a way to map 2D space onto a 1D line while keeping physically close points close to each other on that line. Without a specialized spatial index, your proximity service will either be too slow for users or too expensive for your infrastructure.

📖 Proximity Service: Use Cases & Requirements

Actors & Journeys

User: Searches for businesses by category (e.g., "Sushi") or name within a specific radius. They expect a list sorted by distance and rating.
Business Owner: Registers their business location, updates operating hours, and uploads photos.
Reviewer: Provides feedback and ratings, which are aggregated to influence search ranking.

Functional Requirements

Nearby Search: Given a user's (lat, lon) and a radius, return businesses within that circle.
Business Management: CRUD operations for business profiles and locations.
Ranking: Sort results by a combination of proximity, relevance, and average rating.
Filtering: Support for category, price range, and status (e.g., "Open Now").

Non-Functional Requirements (NFRs)

Low Latency: Search results must return in $< 100$ ms globally.
High Read Availability: Users check for nearby places constantly; downtime is not an option.
Scalability: Support 200 million businesses and handle 50,000 search QPS at peak.
Eventual Consistency: It is acceptable if a newly added business takes 1-2 minutes to appear in search results.

🔍 Basics: How Geolocation Maps to Data

At its core, a proximity service must solve the 2D Range Search problem. Traditional databases are optimized for 1D searches (e.g., "find all users with age between 20 and 30"). A B-tree index can handle this with $O(\log N)$ efficiency.

However, geography is two-dimensional (Latitude and Longitude). A simple index on latitude doesn't help with longitude, and vice-versa. If you search for businesses near $(34.05, -118.24)$, a latitude index will return every business in a horizontal strip across the entire globe at that latitude.

The baseline architecture for solving this involves Spatial Indexing. This process transforms 2D coordinates into a 1D value that "preserves locality"—meaning points that are close together in the real world stay close together in the database index. Without this mapping, every search would require a full table scan, making the system unusable at scale.

⚙️ Mechanics: Turning Coordinates into Searchable Strings

The primary mechanism for spatial indexing is Grid Partitioning. We divide the world into smaller and smaller boxes.

The Grid Approach: Divide the map into $N \times N$ squares. Each square gets a unique ID. To find nearby places, you simply find which square the user is in and query all businesses with that Square ID.
Geohashing: This is a specific type of grid partitioning where the world is recursively divided into 32 quadrants. Each quadrant is assigned a character (0-9, b-z).
- A 1-character Geohash covers the whole world.
- A 6-character Geohash covers a small neighborhood ($\approx 1.2 \text{km} \times 0.6 \text{km}$).
- Crucially: Geohashes share prefixes. All businesses in the same neighborhood will have Geohashes that start with the same 5 or 6 characters. This allows the database to use a simple LIKE '9q8yy%' prefix search on a standard string index.

📐 Estimations & Design Goals

The Math of Global Proximity

Total Listings: 200 Million businesses.
Average Record Size: $\approx 1$ KB (ID, Name, Lat/Lon, Geohash, Category, Rating, Hours).
Metadata Storage: 200M $\times$ 1 KB = 200 GB. This fits easily on a single modern server, but for availability, we'll shard it across a cluster.
Search QPS: 50,000 peak.
Read-to-Write Ratio: 100:1 (Users search much more than owners update listings).

Design Goals

Read-Optimized Indexing: Favor indexing strategies that speed up GET /search at the cost of slightly slower POST /business.
Stateless Search: Scale the search service horizontally by keeping no local state.

📊 High-Level Design: The 9-Cell Expansion

The architecture leverages Geohashing to turn 2D spatial search into a 1D range query.

graph TD
    User -->|lat, lon| AG[API Gateway]
    AG -->|Search| SS[Search Service]
    SS -->|Check Cache| RC[(Redis Search Cache)]
    RC -->|Miss| DB[(Postgres: Geohash Index)]

    Owner -->|Update| BS[Business Service]
    BS -->|Write| DB
    BS -->|Invalidate| RC

    SS -->|Filter & Rank| App[App Layer]
    App --> User

The diagram above captures the full request lifecycle. A user's (lat, lon) enters through the API Gateway and reaches the Search Service, which first consults the Redis Search Cache. On a miss it queries Postgres using the Geohash prefix index, returning a candidate set of businesses. The application layer then applies the Haversine formula to compute exact distances and eliminate any candidates outside the true radius before sorting by the combined score of proximity and rating. Writes flow through the Business Service, which updates Postgres and immediately invalidates the relevant Redis cache entry.

🧠 Deep Dive: The 9-Cell Expansion and Why It Guarantees Completeness

The subtlest defect in any Geohash-based proximity system is the boundary problem. Imagine a user standing at the very edge of a Geohash cell. The nearest café is 50 meters away — but it sits in the adjacent cell. A naive system that only queries the user's own cell will miss that result entirely, returning something 800 meters away instead. This is not a theoretical concern; it is a production bug that appears when users stand near street corners, bridges, or any geographic boundary coinciding with a Geohash cell edge.

The solution is the 9-Cell Expansion. Before querying the database the Search Service computes the Geohash of the user's location and the Geohashes of all 8 surrounding cells. The query then becomes WHERE geohash IN (cell0, cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8). This square of 9 cells completely encloses the user's position, guaranteeing that no business within the true radius is missed because of a cell boundary.

After retrieving the candidate set the application layer applies the Haversine formula to compute exact great-circle distances and filters to only businesses within the requested radius. This two-phase approach — approximate Geohash range query to shrink the candidate set, exact distance filter in application code — is how production systems balance index efficiency with geometric correctness. The database does the cheap work (prefix scan on a B-tree); the application does the expensive work (Haversine) on a small candidate set already bounded to a few hundred rows.

Internals: How Geohash Encoding Maps Coordinates to Strings

Geohash encoding works by recursively bisecting the world into two halves, alternating between longitude and latitude. The first bit of a Geohash encodes whether the longitude falls in the western or eastern hemisphere. The second bit encodes whether the latitude falls in the southern or northern hemisphere. This alternation continues — each additional bit halving the spatial region — until the desired precision is reached. The resulting binary string is then Base32-encoded (using digits 0–9 and letters b–z, excluding vowels) to produce the human-readable Geohash string.

The critical consequence of this encoding is the prefix property: any two points that share a Geohash prefix are guaranteed to be within the cell represented by that prefix. The longer the shared prefix, the smaller the cell and the closer the points. This means a simple SQL LIKE '9q8yy%' query on a B-tree indexed column efficiently finds all businesses in a specific Geohash cell using a prefix range scan — exactly the same operation the database already uses for string prefix lookups.

Geohash Precision vs. Search Radius Reference

Geohash Length	Approximate Cell Size	Best-Fit Search Radius
4 characters	~40 km × 20 km	Country-level browsing
5 characters	~5 km × 2.5 km	City district search
6 characters	~1.2 km × 0.6 km	Neighborhood "nearby"
7 characters	~150 m × 75 m	Street-level precision

For a standard "nearby restaurants within 1 km" query, a 6-character Geohash provides the optimal balance. The 9-cell expansion covers roughly a 3 km × 2 km area, which comfortably contains any business within a 1 km radius without pulling so many rows that the Haversine filter becomes expensive.

Business Data Model

Column	Type	Notes
`business_id`	UUID	Primary key, globally unique
`name`	VARCHAR(255)	Full-text indexed for keyword search
`category`	VARCHAR(100)	B-tree indexed for category filtering
`latitude`	DECIMAL(10,7)	Raw coordinate — used for Haversine only
`longitude`	DECIMAL(10,7)	Raw coordinate — used for Haversine only
`geohash`	CHAR(9)	B-tree indexed; length 6–9 per config
`rating`	DECIMAL(3,2)	Aggregated average, updated asynchronously
`is_open_now`	BOOLEAN	Refreshed every 15 minutes by a background job
`updated_at`	TIMESTAMP	Used to trigger cache invalidation

Performance Analysis: Cache Effectiveness and Query Cost at 200 Million Records

The system's read performance rests on three complementary layers. First, the Redis Search Cache absorbs the vast majority of repeat queries for the same Geohash cell and category combination. With a 60-second TTL, a cell that receives 100 queries per minute will result in only 1 database query per minute — a 99% cache hit rate in steady state. Second, the Geohash B-tree index on Postgres transforms what would otherwise be a full-table scan of 200 million rows into a prefix range scan that returns hundreds of candidates in low single-digit milliseconds. Third, the Haversine application-layer filter operates on this small candidate set, typically fewer than 500 rows, making the CPU cost negligible even at 50,000 search QPS.

The cache's 60-second TTL is specifically chosen to satisfy the "eventual consistency within 1–2 minutes" NFR. Shorter TTLs increase origin load; longer TTLs risk serving stale business listings (closed businesses, changed hours) for too long. At 50,000 QPS peak and 60-second TTL, each unique (geohash6, category) combination effectively rate-limits database queries to at most once per minute, keeping the Postgres cluster at well under 10% utilization during peak traffic.

Redis Cache Schema for Proximity Results

Key Pattern	Redis Type	Value	TTL
`search:{geohash6}:{category}`	String (JSON)	Ranked list of business IDs and scores	60 seconds
`business:{business_id}`	Hash	Full business profile fields	300 seconds
`geohash:neighbors:{geohash6}`	Set	8 surrounding Geohash strings	Permanent

🌍 Where Proximity Search Architecture Appears in Production Systems

The Geohash-plus-9-cell-expansion pattern is not a whiteboard exercise. It is the foundation of some of the most heavily used consumer apps on earth.

Yelp serves 250 million unique monthly visitors across 200 million businesses in 32 countries. Their engineering blog describes a spatial indexing system where Geohash prefix queries provide the initial candidate set and a secondary ranking model — factoring in review count, price range, and personalization — re-orders the results before delivery.

Google Places API uses a more sophisticated indexing technique based on the S2 Geometry Library, which tessellates the sphere using a space-filling Hilbert curve rather than a flat rectangular grid. The conceptual pattern is identical — encode 2D position into a 1D sortable key — but S2 handles the spherical distortion at the poles better than flat Geohash. Developers who build location-aware apps with Google Maps are ultimately consuming this architecture.

Airbnb employs a two-stage proximity pipeline. The first stage is a Geohash-based pre-filter that narrows the search space from 7 million listings worldwide to a few thousand within the requested area. The second stage is a personalized re-ranking model that incorporates pricing, host response rate, and historical booking signals. The Geohash layer described in this HLD is directly equivalent to Airbnb's first stage.

Uber Eats and DoorDash use proximity search not to match riders with drivers, but to determine which restaurants can deliver to a user's address within the promised time window. The search radius is not a fixed circle but a drive-time polygon computed from real-time traffic data — though the initial candidate generation still relies on a Geohash pre-filter for performance reasons.

⚖️ Trade-offs and Failure Modes in Geospatial Architecture

No architecture is without compromise. Understanding these trade-offs is what separates a junior answer from a senior one in a system design interview.

Geohash vs. PostGIS: Geohash is portable, requires no special database extension, and can be replicated in Redis Sorted Sets for in-memory queries. PostGIS is geometrically more accurate (it operates on a spherical model), supports complex polygon queries like delivery zone boundaries, and ships richer spatial functions. The trade-off is operational complexity: PostGIS requires a specialized Postgres extension, DBA expertise, and significantly heavier VACUUM and index maintenance overhead. Choose Geohash when simplicity and multi-database compatibility matter; choose PostGIS when precision on non-circular geometries is required.

Cache Staleness Under Business Updates: The 60-second Redis TTL introduces an inherent consistency lag. A permanently closed business may still appear in search results for up to one minute. For consumer discovery this is fully acceptable. If real-time correctness were required — for example a medical emergency responder service — the system would need a write-through cache with immediate key invalidation on every business update, increasing write latency on the critical path.

The Hot-Cell Problem in Dense Urban Areas: In Manhattan or Tokyo, a single 6-character Geohash cell may contain thousands of businesses. The search:{geohash6}:{category} cache entry becomes both large (serialization overhead) and invalidated frequently (any business update in a dense cell busts the cache). The mitigation is to increase Geohash precision to 7 or 8 characters in high-density areas, dynamically adjusting based on result count per cell.

Radius vs. Cell Size Mismatch: If the requested radius is larger than a Geohash cell at the chosen precision (for example a 50 km radius with 6-character precision), the 9-cell expansion will not cover all qualifying businesses. The Search Service must detect this case and either drop to a lower precision level (expanding to cover more cells) or fall back to a bounding-box query on the raw latitude and longitude columns.

🧭 Choosing the Right Spatial Architecture for Your Scale

Use this framework in your interview to demonstrate structured decision-making when the interviewer asks "why not just use X?"

Requirement	Recommended Approach	Avoid
Simple radius search, under 10 M records	Geohash + B-tree index in Postgres	PostGIS (over-engineered)
Complex delivery zone polygons	PostGIS with `ST_Within` spatial index	Geohash (insufficient for polygons)
Real-time moving objects (drivers, riders)	Redis Geospatial with `GEORADIUS`	Relational DB (too slow for 100 K writes/sec)
200 M+ static businesses, 50 K search QPS	Geohash + Redis Cache Layer + Postgres	Single-node unindexed Postgres
Multi-region global search	Geohash + region-sharded database	Single global Postgres cluster

The design in this post targets the fourth row. If the interviewer pivots to "now make the businesses move in real-time" — a live pop-up event platform, for example — the correct answer is to add a Redis Geospatial layer on top of the Postgres foundation, which is exactly the architecture of the Ride-Sharing HLD.

🧪 Walking Through the Proximity Service Design in a 45-Minute Interview

A system design interview has a predictable rhythm. Here is how to structure your proximity service answer to signal both breadth and depth at each stage.

Minutes 1–5 — Clarify requirements. Ask: "Are businesses static or can they move? What is the expected search radius? Do we need real-time open/closed status?" These questions signal that you understand the design space before drawing boxes.

Minutes 5–10 — Estimate scale. Announce your working assumptions: 200 M businesses, 50 K QPS peak, 100:1 read-to-write ratio. State your design goal explicitly: "I want sub-100 ms search with a Redis cache in front of Postgres."

Minutes 10–20 — Draw the HLD. Sketch the API Gateway, Search Service, Redis cache, Postgres, and Business Service. Explain the read path first, then the write path. Label the 100:1 ratio visually by showing many users hitting the cache and only one in a hundred reaching the database.

Minutes 20–30 — Deep-dive on Geohashing. Explain why B-trees are inherently one-dimensional. Introduce Geohash encoding. Draw a 3×3 grid on the whiteboard and show the 9-cell expansion. This moment separates strong candidates from excellent ones.

Minutes 30–40 — Address the NFRs. Walk through how Redis handles the dominant read load, how Postgres sharding handles 200 M records, and how the 60-second TTL satisfies eventual consistency.

Minutes 40–45 — Handle trade-offs proactively. Acknowledge the hot-cell problem in dense cities, propose dynamic precision tuning based on result count, and offer PostGIS as a precision alternative for polygon delivery zones. Never wait for the interviewer to find a flaw — find it yourself.

📚 What Proximity Search Architecture Teaches Us About Spatial Indexing

Building a proximity service reveals a deeper truth about database design: indexes are the architecture. The decision to use a Geohash B-tree versus a PostGIS R-tree versus a Redis Sorted Set is not an implementation detail — it defines the entire query shape, the consistency model, and the operational burden of the system. In most CRUD applications you can add an index as an afterthought. In spatial systems, the index strategy must be the first design decision, because it constrains every downstream choice.

A secondary insight is the power of approximation followed by precision refinement. The Geohash approach does not compute exact distances at the database level. It narrows the candidate set to a manageable size and then lets the application perform the expensive Haversine computation on just a few hundred rows. This pattern — coarse filter in the data layer, exact filter in the compute layer — appears throughout high-scale system design and is worth internalizing as a general principle far beyond proximity search.

📌 TLDR & Key Takeaways

A proximity service maps 2D coordinates into 1D Geohash strings, enabling standard B-tree prefix queries on a relational database.
The 9-cell expansion (user's cell + 8 neighbors) guarantees no nearby businesses are missed because of a Geohash cell boundary.
Geohash precision 6 (~1.2 km × 0.6 km cells) is the sweet spot for neighborhood-scale queries; precision 7 is better for dense urban cores.
The system is read-heavy at 100:1. Redis with a 60-second TTL absorbs search load; Postgres remains the authoritative source of truth.
A two-phase query (Geohash prefix index in the DB → Haversine exact filter in application code) balances index efficiency with geometric accuracy.
For moving objects such as ride-sharing drivers, replace the Postgres spatial index with Redis Geospatial (GEORADIUS).
PostGIS is the correct alternative when queries involve delivery-zone polygons rather than simple radius circles, at the cost of higher operational complexity.

System Design HLD: Ride-Sharing (Uber/Lyft) — Extends the proximity concepts here to moving objects using Redis Geospatial and real-time GPS streaming.
System Design HLD: Distributed Cache — Deep dive into the Redis caching layer that absorbs the 100:1 read ratio in this design.
System Design HLD: Hotel Booking — Another location-aware availability search system that pairs naturally with proximity service patterns.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...

May 3, 2026•23 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...

May 3, 2026•22 min read

Clock Skew and Causality Violations: Why Distributed Clocks Lie

TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...

May 3, 2026•18 min read

Split Brain Explained: When Two Nodes Both Think They Are Leader

TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...

May 3, 2026•20 min read