Two Heaps Pattern: Find the Median of a Data Stream Without Sorting
Keep a max-heap for the lower half and a min-heap for the upper half. The median lives at the boundary.
Abstract AlgorithmsTLDR: Two Heaps partitions a stream into two sorted halves. A max-heap holds everything below the median; a min-heap holds everything above it. Keep the heaps size-balanced and you can read the median from either top in O(1) β no sorting needed, ever.
π The Real-Time Median Problem: Why Sorting the Whole Stream Fails
Spotify processes millions of salary records and wants to display the median compensation in real time as each new record arrives. The naive approach β collect all records, sort them, pick the middle β has two fatal flaws: sorting is O(N log N) and you would have to re-sort after every single insertion.
The Two Heaps pattern eliminates this waste by keeping the data permanently organized around a single invariant: the entire lower half of the data set lives in one heap, and the entire upper half lives in another. The boundary between those two halves is the median β you never need to sort at all.
The pattern applies beyond salaries. Any domain that tracks a "middle value" over a growing or sliding window β latency percentiles, stock prices, temperature readings, test scores β benefits from the same idea. You gain O(1) median reads at the cost of O(log N) insertions, which is the best possible trade-off for a median data structure.
This post covers the core pattern, three canonical problems with full Java solutions, and the edge cases that trip up candidates in interviews.
π Max-Heap and Min-Heap: Partitioning a Stream into Two Halves
A max-heap always exposes its largest element at the top. A min-heap always exposes its smallest element at the top. These two properties are exactly what you need to see the boundary between "smaller than median" and "larger than median" without scanning the collection.
Here is the core insight spelled out plainly:
- Put every number less than or equal to the current median in the max-heap. Its top is the largest element in the lower half.
- Put every number greater than the current median in the min-heap. Its top is the smallest element in the upper half.
Together, those two tops are the only two values you ever need to compute the median. For an even count, the median is their average. For an odd count, the median is whichever heap holds the extra element.
In Java, PriorityQueue is a min-heap by default. To create a max-heap, pass Collections.reverseOrder() as the comparator:
import java.util.Collections;
import java.util.PriorityQueue;
// Max-heap: lower half of the stream
PriorityQueue<Integer> lowerHalf = new PriorityQueue<>(Collections.reverseOrder());
// Min-heap: upper half of the stream
PriorityQueue<Integer> upperHalf = new PriorityQueue<>();
The reverseOrder() comparator negates all comparisons, so the heap's internal min-property now corresponds to the maximum of the actual values. This is the only non-obvious Java setup step in the entire pattern.
βοΈ The Balancing Rule: Keeping the Two Heaps in Sync
After every insertion, the heap sizes must satisfy this invariant:
lowerHalf.size() - upperHalf.size()is always 0 or 1. Never negative. Never greater than 1.
When both heaps have equal size, the median is the average of their tops. When the lower half has one extra element, the median is lowerHalf.peek().
Insertion in two moves:
- Push the new number into
lowerHalf(max-heap). - Pop
lowerHalf's top and push it intoupperHalfβ this ensures the boundary value crosses to the right side. - If
upperHalf.size() > lowerHalf.size(), popupperHalf's top and push it back intolowerHalfβ this restores the "lower half has one extra" invariant.
Step 2 is the key move. Even if the new number belonged in the lower half, routing it through the max-heap top guarantees the ordering contract between the two heaps is maintained at all times.
State table β inserting [5, 10, 1, 7]:
| Step | Inserted | lowerHalf (max-heap) | upperHalf (min-heap) | Median |
| 1 | 5 | [5] | [] | 5.0 |
| 2 | 10 | [5] | [10] | (5+10)/2 = 7.5 |
| 3 | 1 | [5, 1] | [10] | 5.0 |
| 4 | 7 | [5, 1] | [7, 10] | (5+7)/2 = 6.0 |
At every step, lowerHalf.peek() β€ upperHalf.peek(). That ordering property is the invariant the rebalancing step enforces.
π§ Deep Dive: How Java's PriorityQueue Implements the Two Halves
Internals: PriorityQueue Memory Layout and the Max-Heap Trick
Java's PriorityQueue is a binary heap stored in a flat array. For a min-heap of size N, the element at index i is always less than or equal to both children at indices 2i+1 and 2i+2. The root (index 0) is always the minimum β peek() returns it in O(1) by simply reading queue[0].
offer(x) appends the new element at the end of the array and then sifts up: it swaps the element with its parent repeatedly until the heap property is satisfied. This takes O(log N) comparisons in the worst case.
poll() removes the root, moves the last array element to position 0, and sifts down: it swaps the element with its smaller child repeatedly. This also takes O(log N).
The Collections.reverseOrder() comparator is applied during both sift-up and sift-down comparisons. Because all comparisons are negated, the "minimum" according to the comparator is the actual maximum of your values. The array layout is identical β only the comparison direction is flipped.
Critical caveat: PriorityQueue.remove(Object) runs in O(N) time. It must scan the entire array to find the element before removing it. This matters for the Sliding Window Median problem where outgoing elements must be deleted. For that use case, consider TreeMap instead (see the OSS section).
Performance Analysis: O(log N) Insert and O(1) Median
| Operation | Time Complexity | Explanation |
addNum(x) | O(log N) | At most 3 heap operations (offer, poll, offer), each O(log N) |
findMedian() | O(1) | Reads peek() from one or both heap roots |
remove(x) (sliding window) | O(N) | Linear scan in Java's PriorityQueue |
| Space | O(N) | Each element belongs to exactly one heap |
O(log N) per insertion is optimal for a data structure that maintains the exact median after every update. Any median-finding structure must spend at least O(log N) amortized per insertion β otherwise you could use it to sort in sub-O(N log N) time, which is impossible by comparison-based sorting lower bounds.
The O(1) median query is the direct payoff of maintaining the partition invariant. You never touch the bulk of the data during a query β only the two roots.
π Watching Seven Numbers Flow through Two Heaps
After inserting the stream [5, 10, 1, 7, 3, 8, 4], the two heaps stabilize to this configuration. The lower half max-heap holds [5, 4, 3, 1] and the upper half min-heap holds [7, 8, 10]:
graph TD
subgraph lower["Lower Half β Max-Heap (top = 5)"]
L1["5"] --> L2["4"]
L1 --> L3["3"]
L2 --> L4["1"]
end
subgraph upper["Upper Half β Min-Heap (top = 7)"]
U1["7"] --> U2["8"]
U1 --> U3["10"]
end
L1 -. "Median = (5 + 7) / 2.0 = 6.0" .-> U1
The insertion flowchart, applied once for each arriving element:
flowchart TD
A([New number arrives]) --> B[Push to lowerHalf max-heap]
B --> C[Pop lowerHalf top β push to upperHalf]
C --> D{upperHalf.size > lowerHalf.size?}
D -- Yes --> E[Pop upperHalf top β push to lowerHalf]
D -- No --> F{Total count is odd?}
E --> F
F -- Yes --> G["findMedian() = lowerHalf.peek()"]
F -- No --> H["findMedian() = (lowerHalf.peek() + upperHalf.peek()) / 2.0"]
The diagram shows why the rebalance step in D is the only conditional in the entire algorithm. The two-move insertion (B β C) always runs; the rebalance (E) only fires when the upper half has grown too large.
π Real-World Uses: Salary Analytics, Sliding Windows, and Capital Allocation
The Two Heaps pattern solves a surprisingly wide class of problems because "median of a growing or sliding data set" appears in many systems:
Streaming analytics platforms β Compensation analysis tools at Spotify, LinkedIn, and similar companies compute P50 salary in real time as offers and adjustments are recorded. Re-sorting the entire list after each update is impractical at millions of records. Two Heaps reduces each update to O(log N) regardless of total size.
Network latency monitoring β Observability stacks like Prometheus track P50 (median) request latency over rolling time windows. As old measurements expire and new ones arrive, the sliding window variant of Two Heaps (with lazy deletion) maintains the median without storing a fully sorted list.
Greedy capital allocation β The IPO problem (LeetCode 502) asks: given K rounds of investment, which projects maximize final wealth? At each round you can only pick projects whose required capital you already have. Two Heaps solve this elegantly: a min-heap on required capital surfaces newly affordable projects; a max-heap on profit among affordable projects picks the best one. This is not a median problem but still uses the same two-heap structure.
βοΈ Trade-offs: When Two Heaps Shine and When They Hurt
Two Heaps gives you the best possible median query time (O(1)) at the cost of O(log N) inserts and O(N) deletions. The table below shows where competing approaches win:
| Operation | Two Heaps | Sorted Array | TreeMap |
| Insert | O(log N) | O(N) β shift | O(log N) |
| Median query | O(1) | O(1) | O(log N) |
| Delete by value | O(N) | O(N) | O(log N) |
| Arbitrary percentile | β No | β Yes (index math) | β Yes |
| Memory | O(N) | O(N) | O(N) overhead |
Use Two Heaps when you only need the median and insertions dominate over deletions. The O(1) query and O(log N) insert is optimal for pure streaming.
Use TreeMap when you also need O(log N) deletion (mandatory for the sliding window at scale) or you need arbitrary percentiles. TreeMap<Integer, Integer> (value β count) supports O(log N) for insert, delete, and median β at the cost of more complex bookkeeping.
Avoid Two Heaps when you need the full sorted order, the full distribution, or very frequent deletions with N in the millions. The O(N) deletion cost becomes a bottleneck for large sliding windows.
π§ Decision Guide: Choosing Between Two Heaps and Its Alternatives
| Situation | Recommendation |
| Use when | Median of a continuously growing data stream, or a fixed-K window where deletions are infrequent. |
| Avoid when | You need O(log N) deletion (large sliding windows) or arbitrary percentiles beyond the 50th. |
| Alternative | TreeMap<Integer, Integer> for O(log N) delete; sorted segment tree for full percentile support. |
| Edge cases | Guard against empty heaps before peek(). Use double arithmetic for the average to avoid integer overflow: lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0. |
π§ͺ Three Canonical Problems: Data Stream, Sliding Window, and IPO
Example 1 β Find Median from Data Stream
Design a class that supports addNum(int) and findMedian() on a live stream of integers.
import java.util.Collections;
import java.util.PriorityQueue;
public class MedianFinder {
// Max-heap: holds the lower half of all numbers
private final PriorityQueue<Integer> lowerHalf =
new PriorityQueue<>(Collections.reverseOrder());
// Min-heap: holds the upper half of all numbers
private final PriorityQueue<Integer> upperHalf = new PriorityQueue<>();
public void addNum(int num) {
lowerHalf.offer(num); // step 1: push to lower
upperHalf.offer(lowerHalf.poll()); // step 2: move boundary to upper
// step 3: rebalance if upper grew too large
if (upperHalf.size() > lowerHalf.size()) {
lowerHalf.offer(upperHalf.poll());
}
}
public double findMedian() {
if (lowerHalf.size() > upperHalf.size()) {
return lowerHalf.peek(); // odd count: median in lower
}
// even count: average the two boundary values
return lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0;
}
}
The / 2.0 split in findMedian() prevents integer overflow when both tops are near Integer.MAX_VALUE. Always divide before adding when the intermediate sum could overflow.
Example 2 β Sliding Window Median
Return the median of each window of size K as it slides through the array.
import java.util.Collections;
import java.util.PriorityQueue;
public class SlidingWindowMedian {
public double[] medianSlidingWindow(int[] nums, int k) {
double[] result = new double[nums.length - k + 1];
PriorityQueue<Integer> lower = new PriorityQueue<>(Collections.reverseOrder());
PriorityQueue<Integer> upper = new PriorityQueue<>();
for (int i = 0; i < nums.length; i++) {
// Insert new element using the standard two-step add
lower.offer(nums[i]);
upper.offer(lower.poll());
if (upper.size() > lower.size()) lower.offer(upper.poll());
// Once the window is full, record the median and evict the oldest element
if (i >= k - 1) {
result[i - k + 1] = (lower.size() > upper.size())
? lower.peek()
: lower.peek() / 2.0 + upper.peek() / 2.0;
// Remove the outgoing element (O(N) in PriorityQueue)
int outgoing = nums[i - k + 1];
if (!lower.remove(outgoing)) upper.remove(outgoing);
// Rebalance after removal
if (lower.size() > upper.size() + 1) upper.offer(lower.poll());
else if (upper.size() > lower.size()) lower.offer(upper.poll());
}
}
return result;
}
}
The remove(Object) call is O(N). For very large windows (K in the millions), replace both heaps with TreeMap<Integer, Integer> to achieve O(log N) deletion.
Example 3 β Maximize Capital (IPO)
Given K investment rounds, initial capital W, and lists of project profits and required capitals, choose at most K projects to maximize total capital.
import java.util.Arrays;
import java.util.Collections;
import java.util.PriorityQueue;
public class MaximizeCapital {
public int findMaximizedCapital(int k, int w, int[] profits, int[] capital) {
int n = profits.length;
// Sort projects by required capital so we can unlock them in order
int[][] projects = new int[n][2];
for (int i = 0; i < n; i++) {
projects[i][0] = capital[i];
projects[i][1] = profits[i];
}
Arrays.sort(projects, (a, b) -> a[0] - b[0]);
// Max-heap: among affordable projects, always pick the highest profit
PriorityQueue<Integer> availableProfits =
new PriorityQueue<>(Collections.reverseOrder());
int idx = 0;
for (int round = 0; round < k; round++) {
// Unlock all projects we can now afford
while (idx < n && projects[idx][0] <= w) {
availableProfits.offer(projects[idx][1]);
idx++;
}
if (availableProfits.isEmpty()) break; // no affordable projects remain
w += availableProfits.poll(); // invest in best available project
}
return w;
}
}
This problem uses one max-heap (not two), but the heap direction insight is identical: among all candidates, always surface the maximum quickly. The min-heap on capital is implicit in the sorted array; once capital crosses a threshold, projects are moved into the max-heap.
π οΈ Java PriorityQueue vs. TreeMap: Alternative Implementations
Java's PriorityQueue is the standard choice and handles most Two Heaps problems in O(log N) per operation. Its weakness is O(N) deletion, which matters for sliding window problems with large K.
TreeMap-based Two Heaps replaces each heap with a TreeMap<Integer, Integer> (value β frequency count). firstKey() and lastKey() give the min/max in O(log N), and remove(k) decrements or deletes the count entry in O(log N):
import java.util.TreeMap;
// Minimal TreeMap median helper
TreeMap<Integer, Integer> lower = new TreeMap<>(); // lower half
TreeMap<Integer, Integer> upper = new TreeMap<>(); // upper half
int lowerSize = 0, upperSize = 0;
// lower.lastKey() = max of lower half (O(log N))
// upper.firstKey() = min of upper half (O(log N))
// add/remove: update count, decrement lowerSize/upperSize
For a full deep-dive on TreeMap-based sliding window median, see the Guava MinMaxPriorityQueue documentation β it supports O(log N) removal from either end and is a drop-in upgrade for sliding window problems where the window size K is large.
The choice is straightforward: use PriorityQueue for addNum / findMedian style problems; switch to TreeMap when you need remove(value) to be fast.
π Lessons Learned: Common Two Heaps Pitfalls in Interviews
1. Forgetting the two-step insert. The most common mistake is pushing directly to the "right" heap without routing through the max-heap first. This breaks the ordering invariant and produces wrong medians silently. Always push to lowerHalf first, then move the top to upperHalf.
2. Integer overflow in findMedian. (lowerHalf.peek() + upperHalf.peek()) / 2.0 can overflow if both values are near Integer.MAX_VALUE. Use the split form: lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0.
3. Using remove(Object) for large windows. It is O(N) in PriorityQueue, not O(log N). On a sliding window with N = 100,000 and K = 50,000, this becomes 5 billion operations. Switch to TreeMap if K is large.
4. Off-by-one on the rebalance direction. The invariant is that lowerHalf has the same size or one more element than upperHalf. After a deletion, you must check both directions. Checking only one direction leaves the heaps unbalanced after an upper-half removal.
5. Not handling the empty stream. peek() on an empty PriorityQueue returns null. Guard with if (lowerHalf.isEmpty()) return 0.0; or throw explicitly.
π Summary: The Two Heaps Pattern at a Glance
- Core idea: Partition the stream into a max-heap (lower half) and a min-heap (upper half), keeping them size-balanced. The median is always at one or both tops.
- Insertion cost: O(log N) β at most three heap operations per
addNumcall. - Query cost: O(1) β
findMedian()reads only the heap roots, never scanning the collection. - Deletion cost: O(N) with
PriorityQueue; O(log N) withTreeMap. Choose accordingly. - Three problems, one pattern: Data Stream Median, Sliding Window Median, and Maximize Capital all reduce to the same two-heap structure with minor adaptations.
- Java setup reminder: Max-heap requires
new PriorityQueue<>(Collections.reverseOrder()). WithoutreverseOrder(), both heaps are min-heaps and the pattern silently breaks.
π Practice Quiz
After inserting [3, 1, 5] into a MedianFinder, what does
findMedian()return?- A) 1.0
- B) 3.0
- C) 4.0 Correct Answer: B β lowerHalf = [3, 1], upperHalf = [5]. lowerHalf.size() > upperHalf.size(), so median = lowerHalf.peek() = 3.
You need the median of a sliding window of size K = 50,000 over an array of 1,000,000 integers. What is the bottleneck with a naive PriorityQueue implementation?
- A)
addNumis O(NΒ²) per call - B)
remove(Object)is O(K) per step, making the total O(N Γ K) - C)
findMedianis O(log N) instead of O(1) Correct Answer: B β Java'sPriorityQueue.remove(Object)scans the heap linearly (O(K)) to find the outgoing element, making each window step O(K) instead of O(log K).
- A)
In the IPO / Maximize Capital problem, why is the available-projects structure a max-heap on profit rather than a min-heap?
- A) A max-heap on profit lets you always pick the highest-yielding project in O(1)
- B) A min-heap would require sorting the projects first
- C) Profit values are always negative, so a max-heap is required Correct Answer: A β At each investment round, you want the maximum profit among all affordable projects. A max-heap surfaces that maximum at the root in O(1), giving O(log N) per round overall.
Open-ended challenge: The standard Two Heaps implementation tracks the median (50th percentile). How would you extend it to track an arbitrary percentile P (e.g., P90)? What invariant would replace the 50/50 size-balance rule, and what trade-offs does your design introduce? There is no single correct answer β consider the impact on insertion time, query time, and correctness after deletions.
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Redis Sorted Sets Explained: Skip Lists, Scores, and Real-World Use Cases
TLDR: Redis Sorted Sets (ZSETs) store unique members each paired with a floating-point score, kept in sorted order at all times. Internally they use a skip list for O(log N) range queries and a hash table for O(1) score lookup β giving you the best o...
Write-Time vs Read-Time Fan-Out: How Social Feeds Scale
TLDR: Fan-out is the act of distributing one post to many followers' feeds. Write-time fan-out (push) pre-computes feeds at post time β fast reads but catastrophic write amplification for celebrities. Read-time fan-out (pull) computes feeds on demand...

Two Pointer Technique: Solving Pair and Partition Problems in O(n)
TLDR: Place one pointer at the start and one at the end of a sorted array. Move them toward each other based on a comparison condition. Every classic pair/partition problem that naively runs in O(nΒ²)

Tries (Prefix Trees): The Data Structure Behind Autocomplete
TLDR: A Trie stores strings character by character in a tree, so every string sharing a common prefix shares those nodes. Insert and search are O(L) where L is the word length. Tries beat HashMaps on
