Partitioning Approaches in SQL and NoSQL: Horizontal, Vertical, Range, Hash, and List Partitioning
How to split large tables for query performance and manageability — and how partitioning differs from sharding
Abstract AlgorithmsTLDR: Partitioning splits one logical table into smaller physical pieces called partitions. The database planner skips irrelevant partitions entirely — turning a 30-second full-table scan into a 200ms single-partition read. Range partitioning is best for time-series data. Hash partitioning prevents write hotspots. List partitioning supports geo-compliance. The wrong partition key eliminates all these gains. Partitioning is not sharding: if your app connects to one database endpoint, you are partitioning.
📖 The 2-Billion-Row Problem — Why Indexes Stop Being Enough
Your orders table has 2 billion rows. Every query scans gigabytes of data even with indexes. Partitioning can cut query time from 30 seconds to 200 milliseconds — by making the database ignore 95% of the data before it even looks at an index.
An index on a 2-billion-row table is still a 2-billion-entry B-Tree. Navigating that tree takes time, and the leaf pages are scattered across the entire 400 GB tablespace. Partitioning changes the problem fundamentally: instead of one 400 GB table, you have 48 separate 8 GB tables — one per month over four years. A query for last month's orders reads only 8 GB. The other 392 GB are never opened.
This is the core promise of partitioning. It is why every major database — PostgreSQL, MySQL, Cassandra, DynamoDB — builds it into the data model.
Partitioning vs. Sharding — The Distinction That Matters in Every System Design Interview
These two terms are used interchangeably by engineers and interviewers. They are not the same concept, and conflating them signals an incomplete understanding.
| Dimension | Partitioning | Sharding |
| Data location | Same database node, or transparent coordinator-managed distribution | Independent database nodes, each a fully separate database |
| Application connection | Single endpoint; application sees one table | Shard-aware routing layer; app knows about multiple databases |
| Cross-partition joins | Supported — planner merges results | Expensive scatter-gather or not supported |
| Transaction guarantees | Full ACID within the node | Cross-shard: 2-Phase Commit or eventual consistency only |
| Primary scaling target | Query performance, manageability, archival | Write throughput, storage capacity beyond a single node |
| Classic examples | PostgreSQL PARTITION BY, MySQL partitions | Cassandra, DynamoDB, Vitess, Citus |
Rule of thumb: If your application connects to one connection string, you are partitioning. If it connects to multiple independent databases, you are sharding.
The blur zone: Cassandra and DynamoDB use the word "partition" but their partitions are physically distributed across nodes. They are partitioning and sharding simultaneously. In an interview, acknowledge the overlap: "Cassandra calls it a partition key, but the mechanism is consistent-hashing-based sharding because each partition lives on a different node."
🔍 A Map of Every Partitioning Strategy — What Each One Splits and When to Use It
Before going deep, here is the orientation table:
| Partition Type | Splits By | Best For | Supported In |
| Horizontal (row-based) | Row subsets by a partition key value | Time-series, large operational tables | PostgreSQL, MySQL, Cassandra, DynamoDB |
| Vertical (column-based) | Column subsets into separate tables | Wide tables with hot/cold access patterns | Manual table design; not a native DB feature |
| Range | Ordered value ranges (dates, IDs) | Time-series archival, rolling-window queries | PostgreSQL, MySQL, CockroachDB, Spanner |
| List | Explicit enumerated values | Geographic regions, categorical columns | PostgreSQL, MySQL |
| Hash | hash(key) % N | Even I/O distribution, preventing write hotspots | PostgreSQL, MySQL |
| Composite (sub) | Two levels: range then hash or list | Very large tables needing both pruning and balance | PostgreSQL, MySQL |
Horizontal vs. vertical — the conceptual split: Horizontal partitioning divides rows. Vertical partitioning divides columns. The diagram below shows both at once: the orders table is split into monthly row-subsets on the left; the users table is split into hot-path and cold-path column subsets on the right. These two strategies solve completely different problems and are often combined in the same data model.
graph TD
subgraph HP[Horizontal Partitioning - Row Subsets]
T1[Orders Table - 2B rows] --> P1[Jan 2024 - 200M rows]
T1 --> P2[Feb 2024 - 195M rows]
T1 --> P3[Mar 2024 - 210M rows]
T1 --> P4[Apr 2024 - 205M rows]
end
subgraph VP[Vertical Partitioning - Column Subsets]
T2[Users Table - 50 columns] --> C1[user_core - id, email, name, created_at]
T2 --> C2[user_profile - bio, avatar, preferences, settings]
end
In the horizontal case, every partition holds identical columns but a different month of data. The January 2024 partition is the only one opened when you query WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01'. In the vertical case, user_core is read on every authentication request; user_profile is loaded only on profile pages. Separating them means the hot authentication path never reads the cold profile columns, reducing page I/O and improving buffer pool efficiency.
⚙️ How Range, List, Hash, and Composite Partitioning Actually Work
Range Partitioning: Month-by-Month Tables for Time-Series Data
Range partitioning assigns each row to a partition based on whether the partition key value falls within a defined range. The classic use case is a time-series table partitioned by month.
PostgreSQL DDL — range-partitioned orders table:
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
total_cents BIGINT NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2024_01
PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02
PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Local index: each partition gets its own index
CREATE INDEX ON orders_2024_01 (customer_id);
CREATE INDEX ON orders_2024_02 (customer_id);
MySQL DDL — same table with MySQL syntax:
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT,
customer_id BIGINT NOT NULL,
total_cents BIGINT NOT NULL,
status VARCHAR(50) NOT NULL,
created_at DATE NOT NULL,
PRIMARY KEY (id, created_at) -- MySQL requires partition key in all unique keys
) PARTITION BY RANGE COLUMNS (created_at) (
PARTITION p202401 VALUES LESS THAN ('2024-02-01'),
PARTITION p202402 VALUES LESS THAN ('2024-03-01'),
PARTITION p202403 VALUES LESS THAN ('2024-04-01'),
PARTITION p_future VALUES LESS THAN (MAXVALUE)
);
Pros: Partition pruning on date predicates. Free archival — DROP TABLE orders_2021 removes a year of data in milliseconds instead of running a multi-hour DELETE. Natural fit for compliance windows.
Cons: Write hotspot — all current inserts go to the latest partition. Must pre-create partitions proactively (no matching partition → error). Skewed partition sizes if insert volume is uneven across time.
List Partitioning: Categorical Splits for Geo-Compliance
List partitioning assigns rows based on membership in an explicit set of values. The partition boundary becomes a compliance boundary when the values are geographic identifiers.
-- PostgreSQL list partitioning by region
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
region TEXT NOT NULL,
total_cents BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY LIST (region);
CREATE TABLE orders_us
PARTITION OF orders FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE orders_eu
PARTITION OF orders FOR VALUES IN ('DE', 'FR', 'GB', 'NL', 'ES', 'IT');
CREATE TABLE orders_apac
PARTITION OF orders FOR VALUES IN ('JP', 'SG', 'AU', 'IN', 'KR');
-- Catch unknown regions
CREATE TABLE orders_default PARTITION OF orders DEFAULT;
Pros: Clean pruning on equality and membership predicates. Natural data locality — EU data physically lives in EU partition, enabling storage-tier pinning for GDPR compliance.
Cons: All partition values must be known at design time. Adding a new region requires ALTER TABLE orders ADD PARTITION — a DDL operation. Rows with unrecognized values go to DEFAULT; forgetting a DEFAULT partition causes insert errors.
Hash Partitioning: Even Write Distribution Without Range Pruning
Hash partitioning applies hash(partition_key) % N to distribute rows uniformly across N partitions. It eliminates write hotspots because inserts land on different partitions regardless of insertion time.
-- PostgreSQL hash partitioning by customer_id (4 partitions)
CREATE TABLE events (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY HASH (customer_id);
CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);
Trade-off: Point lookups on customer_id prune to exactly one partition. Range queries (WHERE created_at > '2024-01-01') cannot prune — they must scan all four partitions. Use hash partitioning when point-lookup access patterns dominate and write throughput matters more than range scan efficiency.
Composite Partitioning: Year × Region for Tables That Outgrow Single-Level Strategies
Composite (sub) partitioning applies two levels. A common pattern: partition by year (range) first, then sub-partition each year by region (list) or by customer hash.
-- PostgreSQL composite: range by month, then hash by customer_id within each month
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
total_cents BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2024_01
PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
PARTITION BY HASH (customer_id);
CREATE TABLE orders_2024_01_p0
PARTITION OF orders_2024_01 FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_2024_01_p1
PARTITION OF orders_2024_01 FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... repeat remainder 2, 3 and each subsequent month
With this structure, a query for WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01' AND customer_id = 42 prunes to the single correct sub-partition — one of four January 2024 shards. At 100 billion rows across 48 months × 4 hash buckets = 192 partitions of roughly 500 million rows each — manageable for maintenance without the overhead of thousands of partitions.
Vertical Partitioning: Splitting Hot Columns From Cold Ones
Vertical partitioning is not a database DDL feature — it is a table design decision. You manually split a wide table into two tables linked by primary key:
-- Before: one 50-column table (hits on every login)
-- After: hot-path table + cold-path table
CREATE TABLE user_core (
id BIGINT PRIMARY KEY,
email TEXT NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
);
CREATE TABLE user_profile (
user_id BIGINT PRIMARY KEY REFERENCES user_core(id),
bio TEXT,
avatar_url TEXT,
theme_preference JSONB,
notification_settings JSONB
);
The authentication path reads only user_core — 4 narrow columns. PostgreSQL stores rows in 8 KB pages. Narrower rows mean more rows per page, higher buffer pool efficiency, fewer I/O operations per query. The user_profile table is loaded only on profile page views, keeping it out of the hot buffer pool entirely.
Real-world application: E-commerce product catalogs often have 150–200 columns. Most search queries need 10 of them (title, price, category, thumbnail URL). Vertically partitioning the remaining 140 columns into a product_details table reduces hot-path I/O by 80%.
🧠 Deep Dive: How Partition Pruning Works Inside the Query Optimizer
Partition pruning is the mechanism that makes all the DDL above valuable. Without it, partitioning is extra metadata overhead with zero performance gain.
Internals of the Partition Pruning Optimizer
The PostgreSQL query optimizer stores partition bounds in the system catalog (pg_class, pg_inherits, and partition constraint tables). When a query arrives, the planner enters a phase called constraint exclusion (pre-v10) or native partition pruning (v10+). Native partition pruning reads partition metadata directly rather than re-evaluating CHECK constraints — it is faster and works with bind parameters.
Static pruning happens at parse time when the partition key predicate contains a literal:
-- Static pruning: planner knows at parse time that only Jan 2024 partition qualifies
SELECT * FROM orders WHERE created_at = '2024-01-15';
Dynamic pruning happens at execution time when the predicate value comes from a runtime parameter:
-- Dynamic pruning: planner cannot determine the partition until the parameter is bound
SELECT * FROM orders WHERE created_at = $1;
PostgreSQL v12+ supports dynamic partition pruning for parameterized queries and nested loop joins. MySQL supports static pruning for RANGE and LIST partitions but has limited dynamic pruning support.
The pruning decision inside the planner: For each partition, the planner evaluates whether the partition's bounds can possibly overlap with the query's predicate. For range partitions, this is a simple interval intersection check. For list partitions, it is a set membership check. For hash partitions, the planner computes hash(literal_value) % N to identify the exact target partition.
To observe pruning in action, run EXPLAIN (ANALYZE, VERBOSE) in PostgreSQL. Look for Partitions excluded: N in the output. In a 48-partition table with a query scoped to one month, you should see Partitions excluded: 47.
Performance Analysis: When Partitioning Helps and When It Hurts
The 10× rule: Partition pruning delivers dramatic speedups when a query touches ≤10% of partitions. A query scoped to one month on a 4-year table touches 1/48 of the data — a 48× reduction in I/O before even consulting an index. Combine this with a local index on the partition and a query that was a 30-second full-scan becomes a 200ms index seek.
The pruning killer — the most expensive partitioning mistake in production:
A query like SELECT * FROM orders WHERE customer_id = 42 on a range-partitioned table (partitioned by created_at) gets zero pruning benefit. The planner opens every partition because the partition key (created_at) is not in the predicate. This is slower than an equivalent unpartitioned table — you now have 48 table opens, 48 index scans, and a merge step instead of one.
The fix: include the partition key in critical queries, or redesign the partition strategy. If 90% of queries filter by customer_id, the table should be hash-partitioned by customer_id, not range-partitioned by created_at.
Planner overhead for large partition sets: PostgreSQL adds per-partition overhead at planning time. With 50–100 partitions, this overhead is negligible. With 5,000 partitions, planning time can increase by hundreds of milliseconds, degrading OLTP query latency even when pruning eliminates almost all partitions. The practical ceiling: keep total partition count below 500 for PostgreSQL. Above that, use composite partitioning to reduce the top-level count.
Write throughput characteristics by strategy: | Strategy | Write distribution | Write hotspot risk | | :--- | :--- | :--- | | Range by date | All current inserts → latest partition | High for recent-data tables | | List by region | Inserts → matching region partition | Medium — depends on region traffic distribution | | Hash by ID | Inserts spread evenly across N partitions | Very low | | Composite (range + hash) | Latest range → spread across hash buckets | Low — hash eliminates hotspot within each range |
📊 Visualizing How the Database Skips 95% of Rows Before Looking at an Index
The diagram below traces the full decision path of a query optimizer on a range-partitioned table with data spanning 2021–2024. The query asks for January 2024 data. Follow the path from left to right to see exactly where the pruning decision happens and which branches are eliminated.
flowchart TD
Q[Query: WHERE created_at between Jan 2024 and Feb 2024] --> OPT[Query Optimizer - partition pruning phase]
OPT --> CHECK{Partition key in WHERE clause?}
CHECK -->|yes - pruning active| BOUNDS[Evaluate each partition bound]
CHECK -->|no - full scan| FULL[Scan all partitions - avoid this]
BOUNDS --> S1[Skip Partition 2021 - bounds outside predicate range]
BOUNDS --> S2[Skip Partition 2022 - bounds outside predicate range]
BOUNDS --> S3[Skip Partition 2023 - bounds outside predicate range]
BOUNDS --> SCAN[Scan Partition Jan 2024 - bounds match predicate]
S1 --> RES[Return query results]
S2 --> RES
S3 --> RES
SCAN --> RES
FULL --> RES
The critical fork is the diamond node: is the partition key present in the WHERE clause? When it is, the planner eliminates three partitions before opening a single data file. When it is not, every partition is opened — making a partitioned table perform worse than a single-table scan because of the extra overhead of 48 table opens and result merges. The diagram captures the single most important rule in partition design: the partition key must appear in your most frequent query predicates, or you gain nothing.
🌍 Real-World Applications: How Stripe, Discord, and Airbnb Use Partitioning
Stripe — time-series ledger: The payments platform partitions its transaction ledger by settlement date. Every report for "last month's transactions" touches exactly one partition. Dropping partitions older than the regulatory retention window is a single DDL statement executed in milliseconds — no multi-hour DELETE with vacuuming required.
Discord — Cassandra message history: Discord's message storage uses Cassandra with channel_id as the partition key and a time-based message_id as the clustering key. Every "load last 50 messages in this channel" request hits exactly one Cassandra partition on one node — one disk read sequence, sub-millisecond response. At billions of messages across millions of channels, this works because the data model and query pattern are perfectly aligned: every query supplies the partition key.
Airbnb — geographic search: The search listings table is partitioned by geographic region. A search for "Paris apartments" queries only the EU partition. With 7 million listings worldwide, this pruning reduces the scan from 7 million rows to roughly 800,000 — a 9× reduction before any index is consulted.
Multi-tenant SaaS — query isolation: Partitioning by tenant_id using hash partitioning prevents one tenant's bulk export from causing a full-table scan that degrades query performance for every other tenant sharing the table. Each tenant's data occupies a separate physical partition. The tenant's bulk query scans only that partition.
PostgreSQL + CockroachDB — data residency: CockroachDB's zone configs pin partitions to specific geographic regions. A LIST partition for EU country codes can be assigned a zone constraint requiring data to reside on EU-region nodes. This enforces GDPR compliance at the database layer rather than the application layer:
-- CockroachDB: pin EU partition to EU nodes
ALTER TABLE orders PARTITION BY LIST (country) (
PARTITION eu VALUES IN ('DE', 'FR', 'GB', 'NL'),
PARTITION us VALUES IN ('US', 'CA')
);
ALTER PARTITION eu OF TABLE orders CONFIGURE ZONE USING
constraints = '[+region=eu-west]';
⚖️ Partitioning Trade-offs and the Anti-Patterns That Kill Performance
Partitioning has real costs that are undersold in conference talks. Here is the honest accounting:
| Trade-off | Details |
| Range write hotspot | All current inserts concentrate on the latest partition. Monitor that partition's I/O separately from the rest. |
| No range pruning for hash | Hash-partitioned tables require full scans for range predicates — the hash has no sense of order. |
| Planner overhead for large partition sets | >500 partitions in PostgreSQL adds measurable query planning latency even after pruning. |
| Cross-partition joins | Joins spanning all partitions may be slower than on an unpartitioned table due to multi-partition merge overhead. |
| Maintenance burden | Automated partition creation must be reliable. A missed partition creation causes insert errors at midnight. |
| MySQL foreign key restriction | MySQL does not support foreign keys on partitioned tables. Referential integrity must be enforced at the application layer. |
Five Anti-Patterns That Destroy Partitioning Performance
1. Not including the partition key in critical query predicates. If 90% of queries filter by customer_id but the table is partitioned by created_at, those queries never prune. You added complexity without benefit and likely made things slower.
2. Over-partitioning. 10,000 partitions on a PostgreSQL table causes the planner to spend hundreds of milliseconds evaluating partition bounds even for queries that prune to one. Keep the count below 500. Use composite partitioning to manage very large data sets at lower partition counts.
3. Low-cardinality partition key. A boolean column has two values — two partitions. This adds DDL overhead, halves future flexibility, and provides nearly zero pruning benefit. Partition keys must have enough distinct values to justify the number of partitions you intend to create.
4. Skewed range sizes. If 80% of your data falls within one range partition, that partition carries 80% of the I/O load. Range partitioning on time often causes this: the current month is constantly written to while historical months are static. The solution is monthly (not annual) ranges combined with hash sub-partitioning for the current period.
5. Forgetting the DEFAULT partition on list-partitioned tables. Adding a new country code that is not listed in any explicit partition causes the insert to fail with an error in PostgreSQL. A DEFAULT partition catches unlisted values gracefully and gives you time to add explicit partitions at the next deployment.
🧭 Choosing the Right Partition Strategy: Decision Guide
| Situation | Recommendation |
| Time-series data with rolling archival | Range partitioning by date (month or quarter). Use DETACH PARTITION for instant archival. |
| Even write distribution, high-cardinality key | Hash partitioning by the primary access key (user ID, order ID). |
| Geo-compliance or categorical access patterns | List partitioning by region or country code. Add a DEFAULT partition. |
| Table so large that monthly range partitions are still too big | Composite: range by month, hash by user ID within each month. |
| Queries mostly point lookups by ID | Hash partitioning. Range queries on other columns still scan all partitions — accept this trade-off. |
| Avoid when | Table has <50M rows and existing indexes are sufficient. No natural partition key. Primary bottleneck is write throughput beyond a single node (use sharding instead). |
| Alternative to partitioning | Index-only scans with covering indexes for selective queries. Table inheritance (PostgreSQL) for manual partitioning control. Sharding (Vitess, Citus) when data outgrows a single node. |
Partitioning vs. Sharding — The Final Comparison for System Design Decisions
| Dimension | Partitioning | Sharding |
| Data location | Same node or transparent coordinator | Independent database nodes |
| App connection | Single connection string | Shard-aware routing required |
| Cross-partition joins | Supported — planner merges results | Expensive scatter-gather or forbidden by design |
| Transactions | Full ACID within the node | Limited; 2PC or eventual consistency |
| Scaling target | Query performance, manageability | Write throughput and storage beyond one node |
| Rebalancing | DDL: attach/detach partitions, no data movement across nodes | Move chunks across nodes — major operational event |
| When to choose | Table is large but fits on one node; queries are slow | Dataset or write throughput exceeds one node's ceiling |
The progression: Start with one database. Add partitioning when queries get slow on large tables — it is a DDL change, not a topology change. Move to sharding when the dataset or write throughput genuinely exceeds a single node. Partitioning and sharding are sequential steps on the scaling ladder.
🧪 Worked Example: Building a Partition-Pruned Orders Table End-to-End
This worked example demonstrates the full lifecycle of a range-partitioned orders table in PostgreSQL: creation, querying with partition pruning, instant archival, and a pruning failure case that shows exactly why partition key alignment matters. This scenario is representative of what you would build at a payments company, e-commerce platform, or any SaaS product with time-bounded billing data.
Step 1 — Create the partitioned parent and two monthly child tables:
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
total_cents BIGINT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (id, created_at) -- partition key must be in PK
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE INDEX ON orders_2024_01 (customer_id, created_at);
CREATE INDEX ON orders_2024_02 (customer_id, created_at);
Step 2 — Query with partition pruning (good path):
-- This query supplies the partition key — pruning activates
EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT id, total_cents, status
FROM orders
WHERE created_at BETWEEN '2024-01-10' AND '2024-01-20'
AND customer_id = 42;
-- Expected plan includes:
-- Partitions scanned: 1 (orders_2024_01)
-- Partitions excluded: 1 (orders_2024_02 excluded at plan time)
Step 3 — The pruning failure case (what to avoid):
-- This query omits the partition key — no pruning, full scan of ALL partitions
EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT id, total_cents FROM orders WHERE customer_id = 42;
-- Both partitions scanned. In a 48-partition table, all 48 are opened.
Step 4 — Instant archival by detaching an old partition:
-- Detach January 2024 from the live table (runs in milliseconds, no data copied)
ALTER TABLE orders DETACH PARTITION orders_2024_01;
-- orders_2024_01 is now a standalone table — archive it, compress it, or move it
-- When needed again: ALTER TABLE orders ATTACH PARTITION orders_2024_01
-- FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
This four-step pattern covers the complete partition lifecycle in production. The key insight from Step 3: always verify with EXPLAIN ANALYZE that Partitions excluded is non-zero before committing to a partitioning strategy. If excluded is 0, your query pattern and partition key are misaligned.
🛠️ pg_partman: Automated Partition Lifecycle Management for PostgreSQL
Manual partition creation works for small setups, but production systems need automation. If the cron job that creates next month's partition fails on the 20th, inserts fail at midnight on the 1st. pg_partman (PostgreSQL Partition Manager) solves this.
pg_partman is an open-source PostgreSQL extension that automatically creates future partitions, enforces retention policies, and handles cleanup. It ships a background worker (pg_partman_bgw) that runs maintenance on a configurable schedule inside the database process itself.
Setting up monthly range partitioning with pg_partman:
-- 1. Install the extension (available in RDS, Cloud SQL, Azure Database for PostgreSQL)
CREATE EXTENSION pg_partman;
-- 2. Create the parent table
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
total_cents BIGINT NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
-- 3. Hand the parent table to pg_partman
-- p_premake=4 creates the next 4 months of partitions immediately
SELECT partman.create_parent(
p_parent_table => 'public.orders',
p_control => 'created_at',
p_type => 'range',
p_interval => 'monthly',
p_premake => 4
);
-- 4. Configure retention: keep 24 months, then drop old partitions
UPDATE partman.part_config
SET retention = '24 months',
retention_keep_table = FALSE,
infinite_time_partitions = TRUE
WHERE parent_table = 'public.orders';
-- 5. Schedule maintenance (run via pg_cron or external cron)
CALL partman.run_maintenance_proc();
With this configuration, pg_partman creates the next 4 months immediately so inserts never fail, drops partitions older than 24 months automatically, and handles the idempotent maintenance routine — safe to call frequently.
Key pg_partman configuration parameters:
| Parameter | Purpose |
p_premake | Number of future partitions to pre-create (keep ≥ 2 to handle deploy delays) |
p_interval | Partition interval: 'daily', 'weekly', 'monthly', 'yearly' |
retention | How long to keep partitions before dropping: '24 months', '2 years' |
retention_keep_table | TRUE to detach instead of drop (archive pattern); FALSE to drop outright |
infinite_time_partitions | TRUE creates partitions beyond the premake window as data arrives |
For a full deep-dive on pg_partman including native triggers, sub-partitioning hierarchies, and pg_partman_bgw tuning, see the pg_partman GitHub repository.
📚 Lessons Learned: What Production Teaches You That Documentation Skips
The partition key is a permanent architectural decision. Re-partitioning a 500 GB live table requires building a new partitioned table, migrating data with minimal downtime (logical replication or bulk INSERT/ATTACH), and cutting over. Plan the partition key with the same weight you give to API contracts. Changing it later is a multi-week project.
Monitor partition skew from day one. Set up a weekly query that reports row counts and size per partition. A partition growing 10× larger than its siblings signals skewed distribution. Catch it at 20 GB — fixing it at 200 GB means a live migration under load.
Test partition pruning explicitly before going to production. Add EXPLAIN (ANALYZE, FORMAT JSON) assertions for critical queries in your CI pipeline. Assert that Partitions excluded is non-zero. A routine refactor that wraps the partition key column in a function call — such as WHERE DATE(created_at) = '2024-01-15' instead of WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16' — silently disables pruning.
Cassandra partition sizing is a production SLA, not a guideline. A wide Cassandra partition doesn't just slow one query — it causes coordinator timeouts that degrade all queries routed through that node during GC pauses. Monitor partition sizes in Cassandra JMX metrics. Alert when any partition approaches 100 MB. Fix the data model before it becomes a production incident.
DynamoDB hot partitions require shard key redesign, not more capacity. Adding provisioned throughput to a table with a hot partition key distributes capacity across partitions — but if one partition key is responsible for 80% of traffic, that partition's physical limit is still bounded. The fix is write sharding: append a random suffix (user_id#0 through user_id#9) to spread writes, then use a scatter-gather query on read.
Partition management automation is not optional. Every production system running range-partitioned tables in PostgreSQL or MySQL should use pg_partman, a custom cron job with alerting, or equivalent automation. The failure mode — no partition for tomorrow's date — is silent until midnight when inserts start failing in production.
📌 TLDR: The Partitioning Decision in Five Rules
Range partitioning for time-series data. You get partition pruning on date predicates, free instant archival (drop a partition rather than run DELETE), and a natural fit for rolling-window retention policies.
Hash partitioning for even write distribution. Range partitioning concentrates all writes on the current period. Hash partitioning spreads them evenly across N partitions at the cost of range query efficiency.
List partitioning for geo-compliance and categorical access. When EU data must live on EU infrastructure, a
LISTpartition with EU country codes is the partition boundary AND the compliance boundary.The partition key must be in your most frequent WHERE clauses. The entire performance benefit disappears if critical queries don't include the partition key. Validate this with
EXPLAIN ANALYZEbefore shipping.Partitioning makes one big table fast. Sharding makes one big system survive beyond a single node. They are sequential steps on the scaling ladder, not competing alternatives.
📝 Practice Quiz: Partitioning and Pruning
Your PostgreSQL orders table is range-partitioned by
created_at(monthly, 48 partitions). A query runsSELECT * FROM orders WHERE customer_id = 42. How many partitions will PostgreSQL scan?A) 1 — only the current month
B) 2 — current and previous month
C) All 48 partitions
D) A random subset determined by hash(42) % 48Correct Answer: C. The partition key is
created_at, which is absent from the WHERE clause. The planner cannot prune any partition and must scan all 48. This is the "pruning killer" anti-pattern — the query is likely slower than it would be on an unpartitioned table.You need to delete 3 years of historical orders from a range-partitioned PostgreSQL table in under 5 minutes. What is the fastest approach?
A)
DELETE FROM orders WHERE created_at < '2021-01-01'
B)TRUNCATE orders_2018, orders_2019, orders_2020
C)ALTER TABLE orders DETACH PARTITION orders_2020; DROP TABLE orders_2020;
D)VACUUM FULL ordersCorrect Answer: C. DETACH PARTITION followed by DROP TABLE is a metadata operation — it removes catalog entries and the partition file reference without touching any row data. It completes in milliseconds regardless of partition size. DELETE generates a row for every deleted record in the write-ahead log and takes hours. TRUNCATE is faster but still requires the partitions to be explicitly named; DETACH + DROP is cleaner and supports an archival step before DROP.
A Cassandra table has
PRIMARY KEY ((user_id), event_time). A query runsSELECT * FROM user_events WHERE event_time > '2024-01-01'without specifyinguser_id. What is the result?A) Cassandra prunes to partitions whose clustering key range overlaps January 2024
B) Cassandra performs a full cluster scan across all nodes
C) The query succeeds but is slower than a query with user_id
D) The query is rejected with a partition key missing errorCorrect Answer: B. Without the partition key (
user_id), Cassandra cannot route the query to any specific node and must fan out to every partition on every node. Cassandra will warn you and requiresALLOW FILTERINGto execute — a signal that you are running a full cluster scan. The data model mismatch is the root cause: the access pattern (filter by time globally) requires a separate table or materialized view withevent_timeas the partition key.Your e-commerce product catalog table has 180 columns. Most search queries need only 12 columns (title, price, category, thumbnail, rating, review_count, in_stock, brand, weight, dimensions, sku, created_at). Describe how you would apply both vertical partitioning and horizontal partitioning together to optimize this table, and what query access patterns each strategy addresses. This is an open-ended challenge — there is no single correct answer, but your response should justify each choice.
Sample response: Create
product_corewith the 12 hot-path columns as a vertically partitioned hot table. Createproduct_detailswith the remaining 168 columns, linked byproduct_idforeign key. Apply range partitioning onproduct_corebycreated_at(year or quarter) to enable fast archival of discontinued products. Indexproduct_coreby(category, price)for the most common search predicate combination. Queries on the search results page join onlyproduct_core; the product detail page joins both tables by primary key. This combination reduces the hot-path page size by ~87%, improves buffer pool hit rate, and enables instant archival of discontinued product cohorts by dropping the relevant range partition.MySQL rejects your DDL for a partitioned orders table with
PRIMARY KEY (id)partitioned bycreated_date. What is the cause, and what is the minimal change to fix it?A) MySQL does not support date-based partitioning
B) AUTO_INCREMENT columns cannot be partition keys
C) MySQL requires the partition column to be part of every unique key including the primary key
D) The table needs at least 4 partitions defined before creation succeedsCorrect Answer: C. MySQL enforces that the partition column is included in all unique keys and the primary key. With
PRIMARY KEY (id)and partitioning oncreated_date, MySQL cannot enforce uniqueness ofidacross partitions. The fix isPRIMARY KEY (id, created_date). This is a common DDL error when retrofitting partitioning onto an existing MySQL schema.
🔗 Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work. Sealed classes killed unchecked inheritance. Each...
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy — but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose — range, hash, consistent hashing, or directory — determines whether range queries stay ch...
