Two Heaps Pattern: Find the Median of a Data Stream Without Sorting

Keep a max-heap for the lower half and a min-heap for the upper half. The median lives at the boundary.

Data Structures and Algorithms

Abstract Algorithms

·Mar 29, 2026·15 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 15 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Two Heaps partitions a stream into two sorted halves. A max-heap holds everything below the median; a min-heap holds everything above it. Keep the heaps size-balanced and you can read the median from either top in O(1) — no sorting needed, ever.

📖 The Real-Time Median Problem: Why Sorting the Whole Stream Fails

Spotify processes millions of salary records and wants to display the median compensation in real time as each new record arrives. The naive approach — collect all records, sort them, pick the middle — has two fatal flaws: sorting is O(N log N) and you would have to re-sort after every single insertion.

The Two Heaps pattern eliminates this waste by keeping the data permanently organized around a single invariant: the entire lower half of the data set lives in one heap, and the entire upper half lives in another. The boundary between those two halves is the median — you never need to sort at all.

The pattern applies beyond salaries. Any domain that tracks a "middle value" over a growing or sliding window — latency percentiles, stock prices, temperature readings, test scores — benefits from the same idea. You gain O(1) median reads at the cost of O(log N) insertions, which is the best possible trade-off for a median data structure.

This post covers the core pattern, three canonical problems with full Java solutions, and the edge cases that trip up candidates in interviews.

🔍 Max-Heap and Min-Heap: Partitioning a Stream into Two Halves

A max-heap always exposes its largest element at the top. A min-heap always exposes its smallest element at the top. These two properties are exactly what you need to see the boundary between "smaller than median" and "larger than median" without scanning the collection.

Here is the core insight spelled out plainly:

Put every number less than or equal to the current median in the max-heap. Its top is the largest element in the lower half.
Put every number greater than the current median in the min-heap. Its top is the smallest element in the upper half.

Together, those two tops are the only two values you ever need to compute the median. For an even count, the median is their average. For an odd count, the median is whichever heap holds the extra element.

In Java, PriorityQueue is a min-heap by default. To create a max-heap, pass Collections.reverseOrder() as the comparator:

import java.util.Collections;
import java.util.PriorityQueue;

// Max-heap: lower half of the stream
PriorityQueue<Integer> lowerHalf = new PriorityQueue<>(Collections.reverseOrder());

// Min-heap: upper half of the stream
PriorityQueue<Integer> upperHalf = new PriorityQueue<>();

The reverseOrder() comparator negates all comparisons, so the heap's internal min-property now corresponds to the maximum of the actual values. This is the only non-obvious Java setup step in the entire pattern.

⚙️ The Balancing Rule: Keeping the Two Heaps in Sync

After every insertion, the heap sizes must satisfy this invariant:

lowerHalf.size() - upperHalf.size() is always 0 or 1. Never negative. Never greater than 1.

When both heaps have equal size, the median is the average of their tops. When the lower half has one extra element, the median is lowerHalf.peek().

Insertion in two moves:

Push the new number into lowerHalf (max-heap).
Pop lowerHalf's top and push it into upperHalf — this ensures the boundary value crosses to the right side.
If upperHalf.size() > lowerHalf.size(), pop upperHalf's top and push it back into lowerHalf — this restores the "lower half has one extra" invariant.

Step 2 is the key move. Even if the new number belonged in the lower half, routing it through the max-heap top guarantees the ordering contract between the two heaps is maintained at all times.

State table — inserting [5, 10, 1, 7]:

Step	Inserted	lowerHalf (max-heap)	upperHalf (min-heap)	Median
1	5	[5]	[]	5.0
2	10	[5]	[10]	(5+10)/2 = 7.5
3	1	[5, 1]	[10]	5.0
4	7	[5, 1]	[7, 10]	(5+7)/2 = 6.0

At every step, lowerHalf.peek() ≤ upperHalf.peek(). That ordering property is the invariant the rebalancing step enforces.

🧠 Deep Dive: How Java's PriorityQueue Implements the Two Halves

Internals: PriorityQueue Memory Layout and the Max-Heap Trick

Java's PriorityQueue is a binary heap stored in a flat array. For a min-heap of size N, the element at index i is always less than or equal to both children at indices 2i+1 and 2i+2. The root (index 0) is always the minimum — peek() returns it in O(1) by simply reading queue[0].

offer(x) appends the new element at the end of the array and then sifts up: it swaps the element with its parent repeatedly until the heap property is satisfied. This takes O(log N) comparisons in the worst case.

poll() removes the root, moves the last array element to position 0, and sifts down: it swaps the element with its smaller child repeatedly. This also takes O(log N).

The Collections.reverseOrder() comparator is applied during both sift-up and sift-down comparisons. Because all comparisons are negated, the "minimum" according to the comparator is the actual maximum of your values. The array layout is identical — only the comparison direction is flipped.

Critical caveat: PriorityQueue.remove(Object) runs in O(N) time. It must scan the entire array to find the element before removing it. This matters for the Sliding Window Median problem where outgoing elements must be deleted. For that use case, consider TreeMap instead (see the OSS section).

Performance Analysis: O(log N) Insert and O(1) Median

Operation	Time Complexity	Explanation
`addNum(x)`	O(log N)	At most 3 heap operations (offer, poll, offer), each O(log N)
`findMedian()`	O(1)	Reads `peek()` from one or both heap roots
`remove(x)` (sliding window)	O(N)	Linear scan in Java's PriorityQueue
Space	O(N)	Each element belongs to exactly one heap

O(log N) per insertion is optimal for a data structure that maintains the exact median after every update. Any median-finding structure must spend at least O(log N) amortized per insertion — otherwise you could use it to sort in sub-O(N log N) time, which is impossible by comparison-based sorting lower bounds.

The O(1) median query is the direct payoff of maintaining the partition invariant. You never touch the bulk of the data during a query — only the two roots.

📊 Watching Seven Numbers Flow through Two Heaps

After inserting the stream [5, 10, 1, 7, 3, 8, 4], the two heaps stabilize to this configuration. The lower half max-heap holds [5, 4, 3, 1] and the upper half min-heap holds [7, 8, 10]:

graph TD
    subgraph lower[Lower Half  Max-Heap  (top = 5)]
        L1[5] --> L2[4]
        L1 --> L3[3]
        L2 --> L4[1]
    end
    subgraph upper[Upper Half  Min-Heap  (top = 7)]
        U1[7] --> U2[8]
        U1 --> U3[10]
    end
    L1 -. "Median = (5 + 7) / 2.0 = 6.0" .-> U1

The insertion flowchart, applied once for each arriving element:

flowchart TD
    A([New number arrives]) --> B[Push to lowerHalf max-heap]
    B --> C[Pop lowerHalf top  push to upperHalf]
    C --> D{upperHalf.size > lowerHalf.size?}
    D -- Yes --> E[Pop upperHalf top  push to lowerHalf]
    D -- No --> F{Total count is odd?}
    E --> F
    F -- Yes --> G[findMedian() = lowerHalf.peek()]
    F -- No --> H[findMedian() = (lowerHalf.peek() + upperHalf.peek()) / 2.0]

The diagram shows why the rebalance step in D is the only conditional in the entire algorithm. The two-move insertion (B → C) always runs; the rebalance (E) only fires when the upper half has grown too large.

🌍 Real-World Uses: Salary Analytics, Sliding Windows, and Capital Allocation

The Two Heaps pattern solves a surprisingly wide class of problems because "median of a growing or sliding data set" appears in many systems:

Streaming analytics platforms — Compensation analysis tools at Spotify, LinkedIn, and similar companies compute P50 salary in real time as offers and adjustments are recorded. Re-sorting the entire list after each update is impractical at millions of records. Two Heaps reduces each update to O(log N) regardless of total size.

Network latency monitoring — Observability stacks like Prometheus track P50 (median) request latency over rolling time windows. As old measurements expire and new ones arrive, the sliding window variant of Two Heaps (with lazy deletion) maintains the median without storing a fully sorted list.

Greedy capital allocation — The IPO problem (LeetCode 502) asks: given K rounds of investment, which projects maximize final wealth? At each round you can only pick projects whose required capital you already have. Two Heaps solve this elegantly: a min-heap on required capital surfaces newly affordable projects; a max-heap on profit among affordable projects picks the best one. This is not a median problem but still uses the same two-heap structure.

⚖️ Trade-offs: When Two Heaps Shine and When They Hurt

Two Heaps gives you the best possible median query time (O(1)) at the cost of O(log N) inserts and O(N) deletions. The table below shows where competing approaches win:

Operation	Two Heaps	Sorted Array	TreeMap
Insert	O(log N)	O(N) — shift	O(log N)
Median query	O(1)	O(1)	O(log N)
Delete by value	O(N)	O(N)	O(log N)
Arbitrary percentile	❌ No	✅ Yes (index math)	✅ Yes
Memory	O(N)	O(N)	O(N) overhead

Use Two Heaps when you only need the median and insertions dominate over deletions. The O(1) query and O(log N) insert is optimal for pure streaming.

Use TreeMap when you also need O(log N) deletion (mandatory for the sliding window at scale) or you need arbitrary percentiles. TreeMap<Integer, Integer> (value → count) supports O(log N) for insert, delete, and median — at the cost of more complex bookkeeping.

Avoid Two Heaps when you need the full sorted order, the full distribution, or very frequent deletions with N in the millions. The O(N) deletion cost becomes a bottleneck for large sliding windows.

🧭 Decision Guide: Choosing Between Two Heaps and Its Alternatives

Situation	Recommendation
Use when	Median of a continuously growing data stream, or a fixed-K window where deletions are infrequent.
Avoid when	You need O(log N) deletion (large sliding windows) or arbitrary percentiles beyond the 50th.
Alternative	`TreeMap<Integer, Integer>` for O(log N) delete; sorted segment tree for full percentile support.
Edge cases	Guard against empty heaps before `peek()`. Use `double` arithmetic for the average to avoid integer overflow: `lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0`.

🧪 Three Canonical Problems: Data Stream, Sliding Window, and IPO

Example 1 — Find Median from Data Stream

Design a class that supports addNum(int) and findMedian() on a live stream of integers.

import java.util.Collections;
import java.util.PriorityQueue;

public class MedianFinder {

    // Max-heap: holds the lower half of all numbers
    private final PriorityQueue<Integer> lowerHalf =
            new PriorityQueue<>(Collections.reverseOrder());

    // Min-heap: holds the upper half of all numbers
    private final PriorityQueue<Integer> upperHalf = new PriorityQueue<>();

    public void addNum(int num) {
        lowerHalf.offer(num);                      // step 1: push to lower
        upperHalf.offer(lowerHalf.poll());         // step 2: move boundary to upper

        // step 3: rebalance if upper grew too large
        if (upperHalf.size() > lowerHalf.size()) {
            lowerHalf.offer(upperHalf.poll());
        }
    }

    public double findMedian() {
        if (lowerHalf.size() > upperHalf.size()) {
            return lowerHalf.peek();               // odd count: median in lower
        }
        // even count: average the two boundary values
        return lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0;
    }
}

The / 2.0 split in findMedian() prevents integer overflow when both tops are near Integer.MAX_VALUE. Always divide before adding when the intermediate sum could overflow.

Example 2 — Sliding Window Median

Return the median of each window of size K as it slides through the array.

import java.util.Collections;
import java.util.PriorityQueue;

public class SlidingWindowMedian {

    public double[] medianSlidingWindow(int[] nums, int k) {
        double[] result = new double[nums.length - k + 1];
        PriorityQueue<Integer> lower = new PriorityQueue<>(Collections.reverseOrder());
        PriorityQueue<Integer> upper = new PriorityQueue<>();

        for (int i = 0; i < nums.length; i++) {
            // Insert new element using the standard two-step add
            lower.offer(nums[i]);
            upper.offer(lower.poll());
            if (upper.size() > lower.size()) lower.offer(upper.poll());

            // Once the window is full, record the median and evict the oldest element
            if (i >= k - 1) {
                result[i - k + 1] = (lower.size() > upper.size())
                        ? lower.peek()
                        : lower.peek() / 2.0 + upper.peek() / 2.0;

                // Remove the outgoing element (O(N) in PriorityQueue)
                int outgoing = nums[i - k + 1];
                if (!lower.remove(outgoing)) upper.remove(outgoing);

                // Rebalance after removal
                if (lower.size() > upper.size() + 1)   upper.offer(lower.poll());
                else if (upper.size() > lower.size())  lower.offer(upper.poll());
            }
        }
        return result;
    }
}

The remove(Object) call is O(N). For very large windows (K in the millions), replace both heaps with TreeMap<Integer, Integer> to achieve O(log N) deletion.

Example 3 — Maximize Capital (IPO)

Given K investment rounds, initial capital W, and lists of project profits and required capitals, choose at most K projects to maximize total capital.

import java.util.Arrays;
import java.util.Collections;
import java.util.PriorityQueue;

public class MaximizeCapital {

    public int findMaximizedCapital(int k, int w, int[] profits, int[] capital) {
        int n = profits.length;

        // Sort projects by required capital so we can unlock them in order
        int[][] projects = new int[n][2];
        for (int i = 0; i < n; i++) {
            projects[i][0] = capital[i];
            projects[i][1] = profits[i];
        }
        Arrays.sort(projects, (a, b) -> a[0] - b[0]);

        // Max-heap: among affordable projects, always pick the highest profit
        PriorityQueue<Integer> availableProfits =
                new PriorityQueue<>(Collections.reverseOrder());
        int idx = 0;

        for (int round = 0; round < k; round++) {
            // Unlock all projects we can now afford
            while (idx < n && projects[idx][0] <= w) {
                availableProfits.offer(projects[idx][1]);
                idx++;
            }
            if (availableProfits.isEmpty()) break; // no affordable projects remain
            w += availableProfits.poll();           // invest in best available project
        }
        return w;
    }
}

This problem uses one max-heap (not two), but the heap direction insight is identical: among all candidates, always surface the maximum quickly. The min-heap on capital is implicit in the sorted array; once capital crosses a threshold, projects are moved into the max-heap.

🛠️ Java PriorityQueue vs. TreeMap: Alternative Implementations

Java's PriorityQueue is the standard choice and handles most Two Heaps problems in O(log N) per operation. Its weakness is O(N) deletion, which matters for sliding window problems with large K.

TreeMap-based Two Heaps replaces each heap with a TreeMap<Integer, Integer> (value → frequency count). firstKey() and lastKey() give the min/max in O(log N), and remove(k) decrements or deletes the count entry in O(log N):

import java.util.TreeMap;

// Minimal TreeMap median helper
TreeMap<Integer, Integer> lower = new TreeMap<>(); // lower half
TreeMap<Integer, Integer> upper = new TreeMap<>(); // upper half
int lowerSize = 0, upperSize = 0;

// lower.lastKey() = max of lower half  (O(log N))
// upper.firstKey() = min of upper half (O(log N))
// add/remove: update count, decrement lowerSize/upperSize

For a full deep-dive on TreeMap-based sliding window median, see the Guava MinMaxPriorityQueue documentation — it supports O(log N) removal from either end and is a drop-in upgrade for sliding window problems where the window size K is large.

The choice is straightforward: use PriorityQueue for addNum / findMedian style problems; switch to TreeMap when you need remove(value) to be fast.

📚 Lessons Learned: Common Two Heaps Pitfalls in Interviews

1. Forgetting the two-step insert. The most common mistake is pushing directly to the "right" heap without routing through the max-heap first. This breaks the ordering invariant and produces wrong medians silently. Always push to lowerHalf first, then move the top to upperHalf.

2. Integer overflow in findMedian. (lowerHalf.peek() + upperHalf.peek()) / 2.0 can overflow if both values are near Integer.MAX_VALUE. Use the split form: lowerHalf.peek() / 2.0 + upperHalf.peek() / 2.0.

3. Using remove(Object) for large windows. It is O(N) in PriorityQueue, not O(log N). On a sliding window with N = 100,000 and K = 50,000, this becomes 5 billion operations. Switch to TreeMap if K is large.

4. Off-by-one on the rebalance direction. The invariant is that lowerHalf has the same size or one more element than upperHalf. After a deletion, you must check both directions. Checking only one direction leaves the heaps unbalanced after an upper-half removal.

5. Not handling the empty stream. peek() on an empty PriorityQueue returns null. Guard with if (lowerHalf.isEmpty()) return 0.0; or throw explicitly.

📌 Summary: The Two Heaps Pattern at a Glance

Core idea: Partition the stream into a max-heap (lower half) and a min-heap (upper half), keeping them size-balanced. The median is always at one or both tops.
Insertion cost: O(log N) — at most three heap operations per addNum call.
Query cost: O(1) — findMedian() reads only the heap roots, never scanning the collection.
Deletion cost: O(N) with PriorityQueue; O(log N) with TreeMap. Choose accordingly.
Three problems, one pattern: Data Stream Median, Sliding Window Median, and Maximize Capital all reduce to the same two-heap structure with minor adaptations.
Java setup reminder: Max-heap requires new PriorityQueue<>(Collections.reverseOrder()). Without reverseOrder(), both heaps are min-heaps and the pattern silently breaks.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Count-Min Sketch Explained: Frequency Estimation at Streaming Scale

TLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one column per row, increment those d counters. Query: ...

May 3, 2026•21 min read

HyperLogLog Explained: Counting Billions of Unique Items with 12 KB

TLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit string, the maximum length of leading zeros you obse...

May 3, 2026•17 min read

Bloom Filters Explained: Membership Testing with Zero False Negatives

TLDR: A Bloom filter is a bit array of m bits + k independent hash functions that sets k bits on insert and checks those same k bits on lookup. If any checked bit is 0, the element is definitely not in the set — false negatives are mathematically imp...

May 3, 2026•18 min read

Java 21 to 25: Virtual Threads, Pattern Matching, and Structured Concurrency

TLDR: Java 21 LTS makes virtual threads a production-ready replacement for bounded thread pools — your newFixedThreadPool(200) can become newVirtualThreadPerTaskExecutor() and handle 10× the concurrency with no architectural changes. Pattern switch w...