Top K Elements Pattern: Find the Best K Without Sorting Everything

Maintain a min-heap of size K. Anything smaller than the current minimum gets evicted immediately.

Data Structures and Algorithms

Abstract Algorithms

·Mar 29, 2026·14 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 14 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: To find the top K largest elements, maintain a min-heap of size K. For every new element, push it onto the heap. If the heap exceeds K, evict the minimum. After processing all N elements, the heap holds exactly the K largest. O(N log K) time — orders of magnitude faster than sorting when K is small.

📖 The 50-Million-Tweet Problem: Why You Cannot Sort Everything

Twitter needs to show the top 10 trending hashtags from 50 million tweets in real time. Sorting all 50 million records to find the top 10 would take O(N log N) — roughly 50M × 25 = 1.25 billion operations. For a system processing new tweets every millisecond, this is unacceptable.

The Top K Elements pattern reduces this to O(N log K): you process all 50 million records but maintain only a heap of size 10. Each insert and eviction costs O(log 10) — effectively constant — making the full pass O(N). The heap acts as a rolling leaderboard: keep the K best seen so far, throw out anything that cannot compete.

The same idea powers dozens of real-world features:

Recommendation engines — keep the K most relevant items without ranking all candidates.
Log monitoring — find the K slowest queries from a stream of millions.
Gaming leaderboards — track the K highest scores from continuous submissions.
Closest points — return the K nearest neighbors without computing and sorting all distances.

🔍 Min-Heap of Size K: The Eviction Strategy That Finds the Top K

The strategy sounds counterintuitive: to find the K largest elements, you use a min-heap. Here is why.

A min-heap of size K holds the K largest elements seen so far. Its root is the smallest of those K elements — the weakest member of your current top-K leaderboard. When a new element arrives:

If it is larger than the root, it can displace the weakest champion. Evict the root, push the new element. The heap still holds the K largest seen so far.
If it is smaller than or equal to the root, it cannot enter the top K. Ignore it.

At the end, the heap contains exactly the K largest elements in the full collection.

Why not a max-heap? A max-heap of size K would surface the current maximum but not efficiently tell you which element to evict when K is exceeded. You would have to scan all K elements to find the minimum, costing O(K) per step instead of O(log K).

The direction rule to memorize:

Top K largest → min-heap of size K (evict the minimum when overfull)
Top K smallest → max-heap of size K (evict the maximum when overfull)

⚙️ Min-Heap for Top K Largest, Max-Heap for Top K Smallest

The core algorithm in three lines of logic:

for each element x in the collection:
    heap.offer(x)
    if heap.size() > K:
        heap.poll()   // evict the "weakest" candidate

After the loop, heap holds the answer.

Worked example — top 3 largest from [3, 1, 5, 12, 2, 11, 7]:

Element	Action	Min-Heap state (sorted for clarity)	Heap size
3	push	[3]	1
1	push	[1, 3]	2
5	push	[1, 3, 5]	3
12	push → evict 1	[3, 5, 12]	3
2	2 < root(3), evict 2	[3, 5, 12]	3
11	push → evict 3	[5, 11, 12]	3
7	7 > root(5) → evict 5	[7, 11, 12]	3

Result: top 3 largest = {7, 11, 12}. The heap never grew beyond K = 3.

In Java:

PriorityQueue<Integer> minHeap = new PriorityQueue<>(); // default: min-heap

No comparator needed for top-K-largest — the default min-heap is exactly what you want.

🧠 Deep Dive: How PriorityQueue Handles K-Bounded Heap Operations

Internals: Heap Sift-Up, Sift-Down, and the Eviction Trigger

Java's PriorityQueue is a binary heap in an array. The key invariant: every node is less than or equal to its children. The root (index 0) is always the minimum — accessible in O(1) via peek().

offer(x) places the new element at the array's tail (index size) and sifts up: it compares the element to its parent at (i-1)/2 and swaps if necessary, repeating until the heap property is satisfied. At most log₂(K) swaps occur because the heap has at most K+1 elements at this point.

poll() removes index 0, moves the last element to index 0, and sifts down: it swaps with the smaller of its two children, repeating until no swap is needed. Also O(log K).

The eviction step (if heap.size() > K: heap.poll()) fires immediately after each offer. This keeps the heap bounded at size K for the entire pass, so both operations cost O(log K) — not O(log N).

Custom comparators work the same way: you pass a Comparator to the constructor, and it governs all sift-up and sift-down comparisons. For K closest points, the comparator computes squared Euclidean distance; for top-K frequent, it compares frequency values from a map.

Performance Analysis: O(N log K) vs O(N log N) Sort

Approach	Time	Space	When it wins
Min-heap of size K	O(N log K)	O(K)	K ≪ N (K=10 from 50M → ~5× faster)
Full sort	O(N log N)	O(N)	Need fully sorted output or K ≈ N
QuickSelect	O(N) average	O(1) extra	Only need the Kth element, not all K

Memory advantage is dramatic: the heap uses O(K) space — for top-10 from 50 million records, that is 10 integers in memory versus 50 million. For systems where the input does not fit in memory (external sort scenarios), the heap approach is the only viable option.

When to choose QuickSelect instead: if you only need the exact Kth value and do not need the other K-1 elements in the top-K set, QuickSelect (partition-based) runs in O(N) average time and O(1) extra space. Its weakness is O(N²) worst case without a randomized pivot. For interview problems that ask "return the K elements," heap is safer; for "return only the Kth value," QuickSelect is faster.

📊 Visualizing Heap Eviction as Elements Arrive

The flowchart below shows the per-element decision loop, including the eviction trigger that keeps the heap bounded at K:

flowchart TD
    A([Next element x]) --> B[heap.offer x]
    B --> C{heap.size > K?}
    C -- Yes --> D[heap.poll  evict minimum]
    C -- No --> E{More elements?}
    D --> E
    E -- Yes --> A
    E -- No --> F[heap contains top K elements]
    F --> G{Need K elements in order?}
    G -- Yes --> H[Drain heap into array  O K log K]
    G -- No --> I[Return heap directly]

After the main loop, draining the heap gives the K elements in ascending order. If you need descending order (largest first), drain into an array and reverse, or use Collections.reverseOrder() on a second pass.

Social media trending — Twitter and TikTok compute top-K hashtags or topics from a sliding window of recent activity. A frequency map builds the counts; a min-heap of size K runs over the map entries. The entire pipeline fits in a single O(N log K) pass, meaning trending can refresh every few seconds even at billions of events per day.

Recommendation engines — Spotify's "Daily Mix" selects the K most relevant tracks from a catalog of 100 million songs. Relevance scores are computed for each candidate; a min-heap of size K processes the full catalog in O(N log K) with O(K) memory per user request.

Observability and alerting — SRE teams at Google and similar companies track the K slowest endpoints from millions of trace spans. A heap pass over each minute's trace data surfaces the slowest K in O(N log K) without storing or sorting the full minute's worth of spans.

Case study — Top K Frequent Elements in practice:

Given one hour of request logs (10 million entries), find the 20 most frequently called API endpoints.

Build frequency map: O(N) time, O(distinct endpoints) space.
Run min-heap of size 20 over the map entries: O(D log 20) where D is distinct endpoint count.
Total: O(N + D log K) ≈ O(N) for typical D ≪ N.

⚖️ Trade-offs: Heap vs Sort vs QuickSelect

Understanding when to use each approach is as important as knowing how to implement them:

Concern	Min-Heap of size K	Full Sort	QuickSelect
Time (find K elements)	O(N log K)	O(N log N)	O(N) average
Space	O(K)	O(N) or O(log N) in-place	O(1) extra
Worst-case time	O(N log K) — guaranteed	O(N log N)	O(N²) without randomized pivot
Streaming / external data	✅ Yes	❌ Requires all data in memory	❌ Requires random access
Returns sorted K results	❌ Unsorted (O(K log K) to sort)	✅ Fully sorted	❌ Only finds Kth value
Implementation complexity	Low	None (use `Arrays.sort`)	Medium

The heap is the right default choice for interviews and production systems where K ≪ N, data may arrive in a stream, or memory is constrained. Reach for QuickSelect only when the problem explicitly asks for the Kth value and you need to minimize average time.

🧭 Decision Guide: When to Use a Heap vs QuickSelect

Situation	Recommendation
Use when	Finding top/bottom K from a large or streaming collection where K ≪ N.
Avoid when	You need the K elements in sorted order and K is large (just sort instead).
Alternative	QuickSelect for "Kth value only" at O(N) average; `Arrays.sort` + slice when K ≈ N.
Edge cases	K > N (return all N elements); duplicate values (heap handles them correctly — no special case needed); K = 1 (a single-pass max/min scan is O(N) with O(1) space — no heap needed).

🧪 Four Problems: Kth Largest, Closest Points, Frequent Elements, Stream

Example 1 — Kth Largest Element in an Array

Return the Kth largest element (1-indexed) from an unsorted array.

import java.util.PriorityQueue;

public class KthLargestElement {

    public int findKthLargest(int[] nums, int k) {
        // Min-heap of size K: root is always the Kth largest seen so far
        PriorityQueue<Integer> minHeap = new PriorityQueue<>();

        for (int num : nums) {
            minHeap.offer(num);
            if (minHeap.size() > k) {
                minHeap.poll(); // evict the smallest: it cannot be in top K
            }
        }
        return minHeap.peek(); // root = Kth largest
    }
}

Time: O(N log K). Space: O(K).

Example 2 — K Closest Points to the Origin

Return the K points from a 2D array that are closest to the origin (0, 0). Distance is Euclidean; you can compare squared distances to avoid the sqrt call.

import java.util.PriorityQueue;

public class KClosestPoints {

    public int[][] kClosest(int[][] points, int k) {
        // Max-heap by squared distance: root = farthest point in current top-K
        PriorityQueue<int[]> maxHeap = new PriorityQueue<>(
                (a, b) -> (b[0] * b[0] + b[1] * b[1]) - (a[0] * a[0] + a[1] * a[1])
        );

        for (int[] point : points) {
            maxHeap.offer(point);
            if (maxHeap.size() > k) {
                maxHeap.poll(); // evict the farthest point
            }
        }

        int[][] result = new int[k][2];
        int i = 0;
        while (!maxHeap.isEmpty()) result[i++] = maxHeap.poll();
        return result;
    }
}

This is the top K smallest variant: use a max-heap, evict the maximum (farthest point) when overfull.

Example 3 — Top K Frequent Elements

Return the K elements that appear most frequently in the input array.

import java.util.*;

public class TopKFrequent {

    public int[] topKFrequent(int[] nums, int k) {
        // Step 1: build frequency map
        Map<Integer, Integer> freq = new HashMap<>();
        for (int num : nums) freq.merge(num, 1, Integer::sum);

        // Step 2: min-heap by frequency — root = least-frequent element in current top-K
        PriorityQueue<Integer> minHeap = new PriorityQueue<>(
                Comparator.comparingInt(freq::get)
        );

        for (int num : freq.keySet()) {
            minHeap.offer(num);
            if (minHeap.size() > k) minHeap.poll(); // evict least frequent
        }

        int[] result = new int[k];
        for (int i = k - 1; i >= 0; i--) result[i] = minHeap.poll();
        return result;
    }
}

The comparator Comparator.comparingInt(freq::get) orders elements by their frequency count, not by value. This is the key technique for any "top K by custom score" variant: build the score map first, then use it inside the comparator.

Example 4 — Kth Largest Element in a Stream

Design a class that returns the Kth largest element after each insertion into a live stream.

import java.util.PriorityQueue;

public class KthLargestStream {

    private final PriorityQueue<Integer> minHeap;
    private final int k;

    public KthLargestStream(int k, int[] initialNums) {
        this.k = k;
        this.minHeap = new PriorityQueue<>();
        for (int num : initialNums) add(num);
    }

    public int add(int val) {
        minHeap.offer(val);
        if (minHeap.size() > k) minHeap.poll(); // maintain exactly K elements
        return minHeap.peek();                   // root = Kth largest at all times
    }
}

The heap maintains the K largest elements seen so far. Its root is the smallest of those K — by definition the Kth largest. Each add() costs O(log K), making N additions cost O(N log K) total.

🛠️ Java PriorityQueue and Guava's MinMaxPriorityQueue in Practice

Java's PriorityQueue is the idiomatic choice and sufficient for all four problems above. Two limitations worth knowing:

PriorityQueue is not thread-safe. For concurrent stream processing, use java.util.concurrent.PriorityBlockingQueue or externally synchronize.
remove(Object) is O(N), not O(log N). For workloads that require element removal (not just top/poll), consider alternatives.

Guava's MinMaxPriorityQueue supports O(log N) access to both the minimum and maximum ends. For sliding window "top K" problems where you need efficient eviction from both ends, it is a drop-in replacement:

// Guava dependency: com.google.guava:guava
import com.google.common.collect.MinMaxPriorityQueue;

MinMaxPriorityQueue<Integer> deque = MinMaxPriorityQueue
        .maximumSize(k)   // automatically evicts the minimum when size exceeds K
        .create();

deque.add(5);
deque.add(1);
deque.add(10);
// deque.peekLast() = minimum; deque.peekFirst() = maximum

The maximumSize(k) call is equivalent to the manual if (heap.size() > k) heap.poll() guard, making the code more expressive. For pure algorithm interviews, stick to the standard PriorityQueue; for production streaming pipelines, MinMaxPriorityQueue reduces boilerplate.

📚 Lessons Learned: Mistakes When Choosing Heap Direction

1. Using the wrong heap direction. This is the #1 interview mistake. For top-K largest, use a min-heap (not max-heap). The min-heap root gives you the weakest element to evict. Using a max-heap of size K would mean you cannot efficiently evict the smallest — you would need to scan all K elements.

2. Forgetting the custom comparator for non-integer types. When comparing points, strings, or custom objects, you must provide a comparator. A common error is comparing int[] arrays with (a, b) -> a[0] - b[0], which only compares the first element. Always write the full multi-field comparator your problem needs.

3. Conflating Kth largest with Kth smallest. "Kth largest in a stream" means the K-th largest overall — the root of a min-heap of size K. "Kth smallest" means the root of a max-heap of size K. Mix them up and you will return the wrong element without any error.

4. Not handling K > array length. When K is larger than the input, return all elements. Guard with k = Math.min(k, nums.length) at the start.

5. Ignoring integer overflow in comparators. (a, b) -> a - b as a comparator can overflow when a is Integer.MIN_VALUE and b is positive. Use Integer.compare(a, b) or Comparator.comparingInt(...) instead.

📌 Summary: Top K in Five Bullets

Core idea: A min-heap of size K acts as a rolling leaderboard — every new element challenges the weakest current champion. If it wins, the champion is evicted.
Time: O(N log K) — optimal when K ≪ N. For K = 10 from N = 50,000,000, this is roughly 50M × 3.3 operations = ~170M vs 50M × 25 = 1.25B for sort.
Space: O(K) — the heap holds exactly K elements at all times. Ideal for streaming and memory-constrained environments.
Direction rule: Top K largest → min-heap of size K. Top K smallest → max-heap of size K.
When to switch: Use QuickSelect for O(N) average when only the Kth value is needed. Use a full sort when K ≈ N or fully ordered output is required.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

HyperLogLog Explained: Counting Billions of Unique Items with 12 KB

TLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit string, the maximum length of leading zeros you obse...

May 3, 2026•17 min read

Count-Min Sketch Explained: Frequency Estimation at Streaming Scale

TLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one column per row, increment those d counters. Query: ...

May 3, 2026•21 min read

Bloom Filters Explained: Membership Testing with Zero False Negatives

TLDR: A Bloom filter is a bit array of m bits + k independent hash functions that sets k bits on insert and checks those same k bits on lookup. If any checked bit is 0, the element is definitely not in the set — false negatives are mathematically imp...

May 3, 2026•18 min read

Java 21 to 25: Virtual Threads, Pattern Matching, and Structured Concurrency

TLDR: Java 21 LTS makes virtual threads a production-ready replacement for bounded thread pools — your newFixedThreadPool(200) can become newVirtualThreadPerTaskExecutor() and handle 10× the concurrency with no architectural changes. Pattern switch w...