Top K Elements Pattern: Find the Best K Without Sorting Everything
Maintain a min-heap of size K. Anything smaller than the current minimum gets evicted immediately.
Abstract AlgorithmsTLDR: To find the top K largest elements, maintain a min-heap of size K. For every new element, push it onto the heap. If the heap exceeds K, evict the minimum. After processing all N elements, the heap holds exactly the K largest. O(N log K) time β orders of magnitude faster than sorting when K is small.
π The 50-Million-Tweet Problem: Why You Cannot Sort Everything
Twitter needs to show the top 10 trending hashtags from 50 million tweets in real time. Sorting all 50 million records to find the top 10 would take O(N log N) β roughly 50M Γ 25 = 1.25 billion operations. For a system processing new tweets every millisecond, this is unacceptable.
The Top K Elements pattern reduces this to O(N log K): you process all 50 million records but maintain only a heap of size 10. Each insert and eviction costs O(log 10) β effectively constant β making the full pass O(N). The heap acts as a rolling leaderboard: keep the K best seen so far, throw out anything that cannot compete.
The same idea powers dozens of real-world features:
- Recommendation engines β keep the K most relevant items without ranking all candidates.
- Log monitoring β find the K slowest queries from a stream of millions.
- Gaming leaderboards β track the K highest scores from continuous submissions.
- Closest points β return the K nearest neighbors without computing and sorting all distances.
π Min-Heap of Size K: The Eviction Strategy That Finds the Top K
The strategy sounds counterintuitive: to find the K largest elements, you use a min-heap. Here is why.
A min-heap of size K holds the K largest elements seen so far. Its root is the smallest of those K elements β the weakest member of your current top-K leaderboard. When a new element arrives:
- If it is larger than the root, it can displace the weakest champion. Evict the root, push the new element. The heap still holds the K largest seen so far.
- If it is smaller than or equal to the root, it cannot enter the top K. Ignore it.
At the end, the heap contains exactly the K largest elements in the full collection.
Why not a max-heap? A max-heap of size K would surface the current maximum but not efficiently tell you which element to evict when K is exceeded. You would have to scan all K elements to find the minimum, costing O(K) per step instead of O(log K).
The direction rule to memorize:
- Top K largest β min-heap of size K (evict the minimum when overfull)
- Top K smallest β max-heap of size K (evict the maximum when overfull)
βοΈ Min-Heap for Top K Largest, Max-Heap for Top K Smallest
The core algorithm in three lines of logic:
for each element x in the collection:
heap.offer(x)
if heap.size() > K:
heap.poll() // evict the "weakest" candidate
After the loop, heap holds the answer.
Worked example β top 3 largest from [3, 1, 5, 12, 2, 11, 7]:
| Element | Action | Min-Heap state (sorted for clarity) | Heap size |
| 3 | push | [3] | 1 |
| 1 | push | [1, 3] | 2 |
| 5 | push | [1, 3, 5] | 3 |
| 12 | push β evict 1 | [3, 5, 12] | 3 |
| 2 | 2 < root(3), evict 2 | [3, 5, 12] | 3 |
| 11 | push β evict 3 | [5, 11, 12] | 3 |
| 7 | 7 > root(5) β evict 5 | [7, 11, 12] | 3 |
Result: top 3 largest = {7, 11, 12}. The heap never grew beyond K = 3.
In Java:
PriorityQueue<Integer> minHeap = new PriorityQueue<>(); // default: min-heap
No comparator needed for top-K-largest β the default min-heap is exactly what you want.
π§ Deep Dive: How PriorityQueue Handles K-Bounded Heap Operations
Internals: Heap Sift-Up, Sift-Down, and the Eviction Trigger
Java's PriorityQueue is a binary heap in an array. The key invariant: every node is less than or equal to its children. The root (index 0) is always the minimum β accessible in O(1) via peek().
offer(x) places the new element at the array's tail (index size) and sifts up: it compares the element to its parent at (i-1)/2 and swaps if necessary, repeating until the heap property is satisfied. At most logβ(K) swaps occur because the heap has at most K+1 elements at this point.
poll() removes index 0, moves the last element to index 0, and sifts down: it swaps with the smaller of its two children, repeating until no swap is needed. Also O(log K).
The eviction step (if heap.size() > K: heap.poll()) fires immediately after each offer. This keeps the heap bounded at size K for the entire pass, so both operations cost O(log K) β not O(log N).
Custom comparators work the same way: you pass a Comparator to the constructor, and it governs all sift-up and sift-down comparisons. For K closest points, the comparator computes squared Euclidean distance; for top-K frequent, it compares frequency values from a map.
Performance Analysis: O(N log K) vs O(N log N) Sort
| Approach | Time | Space | When it wins |
| Min-heap of size K | O(N log K) | O(K) | K βͺ N (K=10 from 50M β ~5Γ faster) |
| Full sort | O(N log N) | O(N) | Need fully sorted output or K β N |
| QuickSelect | O(N) average | O(1) extra | Only need the Kth element, not all K |
Memory advantage is dramatic: the heap uses O(K) space β for top-10 from 50 million records, that is 10 integers in memory versus 50 million. For systems where the input does not fit in memory (external sort scenarios), the heap approach is the only viable option.
When to choose QuickSelect instead: if you only need the exact Kth value and do not need the other K-1 elements in the top-K set, QuickSelect (partition-based) runs in O(N) average time and O(1) extra space. Its weakness is O(NΒ²) worst case without a randomized pivot. For interview problems that ask "return the K elements," heap is safer; for "return only the Kth value," QuickSelect is faster.
π Visualizing Heap Eviction as Elements Arrive
The flowchart below shows the per-element decision loop, including the eviction trigger that keeps the heap bounded at K:
flowchart TD
A([Next element x]) --> B[heap.offer x]
B --> C{heap.size > K?}
C -- Yes --> D[heap.poll β evict minimum]
C -- No --> E{More elements?}
D --> E
E -- Yes --> A
E -- No --> F[heap contains top K elements]
F --> G{Need K elements in order?}
G -- Yes --> H[Drain heap into array β O K log K]
G -- No --> I[Return heap directly]
After the main loop, draining the heap gives the K elements in ascending order. If you need descending order (largest first), drain into an array and reverse, or use Collections.reverseOrder() on a second pass.
π Real-World Applications: Trending Hashtags, Recommendations, and Monitoring
Social media trending β Twitter and TikTok compute top-K hashtags or topics from a sliding window of recent activity. A frequency map builds the counts; a min-heap of size K runs over the map entries. The entire pipeline fits in a single O(N log K) pass, meaning trending can refresh every few seconds even at billions of events per day.
Recommendation engines β Spotify's "Daily Mix" selects the K most relevant tracks from a catalog of 100 million songs. Relevance scores are computed for each candidate; a min-heap of size K processes the full catalog in O(N log K) with O(K) memory per user request.
Observability and alerting β SRE teams at Google and similar companies track the K slowest endpoints from millions of trace spans. A heap pass over each minute's trace data surfaces the slowest K in O(N log K) without storing or sorting the full minute's worth of spans.
Case study β Top K Frequent Elements in practice:
Given one hour of request logs (10 million entries), find the 20 most frequently called API endpoints.
- Build frequency map: O(N) time, O(distinct endpoints) space.
- Run min-heap of size 20 over the map entries: O(D log 20) where D is distinct endpoint count.
- Total: O(N + D log K) β O(N) for typical D βͺ N.
βοΈ Trade-offs: Heap vs Sort vs QuickSelect
Understanding when to use each approach is as important as knowing how to implement them:
| Concern | Min-Heap of size K | Full Sort | QuickSelect |
| Time (find K elements) | O(N log K) | O(N log N) | O(N) average |
| Space | O(K) | O(N) or O(log N) in-place | O(1) extra |
| Worst-case time | O(N log K) β guaranteed | O(N log N) | O(NΒ²) without randomized pivot |
| Streaming / external data | β Yes | β Requires all data in memory | β Requires random access |
| Returns sorted K results | β Unsorted (O(K log K) to sort) | β Fully sorted | β Only finds Kth value |
| Implementation complexity | Low | None (use Arrays.sort) | Medium |
The heap is the right default choice for interviews and production systems where K βͺ N, data may arrive in a stream, or memory is constrained. Reach for QuickSelect only when the problem explicitly asks for the Kth value and you need to minimize average time.
π§ Decision Guide: When to Use a Heap vs QuickSelect
| Situation | Recommendation |
| Use when | Finding top/bottom K from a large or streaming collection where K βͺ N. |
| Avoid when | You need the K elements in sorted order and K is large (just sort instead). |
| Alternative | QuickSelect for "Kth value only" at O(N) average; Arrays.sort + slice when K β N. |
| Edge cases | K > N (return all N elements); duplicate values (heap handles them correctly β no special case needed); K = 1 (a single-pass max/min scan is O(N) with O(1) space β no heap needed). |
π§ͺ Four Problems: Kth Largest, Closest Points, Frequent Elements, Stream
Example 1 β Kth Largest Element in an Array
Return the Kth largest element (1-indexed) from an unsorted array.
import java.util.PriorityQueue;
public class KthLargestElement {
public int findKthLargest(int[] nums, int k) {
// Min-heap of size K: root is always the Kth largest seen so far
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
for (int num : nums) {
minHeap.offer(num);
if (minHeap.size() > k) {
minHeap.poll(); // evict the smallest: it cannot be in top K
}
}
return minHeap.peek(); // root = Kth largest
}
}
Time: O(N log K). Space: O(K).
Example 2 β K Closest Points to the Origin
Return the K points from a 2D array that are closest to the origin (0, 0). Distance is Euclidean; you can compare squared distances to avoid the sqrt call.
import java.util.PriorityQueue;
public class KClosestPoints {
public int[][] kClosest(int[][] points, int k) {
// Max-heap by squared distance: root = farthest point in current top-K
PriorityQueue<int[]> maxHeap = new PriorityQueue<>(
(a, b) -> (b[0] * b[0] + b[1] * b[1]) - (a[0] * a[0] + a[1] * a[1])
);
for (int[] point : points) {
maxHeap.offer(point);
if (maxHeap.size() > k) {
maxHeap.poll(); // evict the farthest point
}
}
int[][] result = new int[k][2];
int i = 0;
while (!maxHeap.isEmpty()) result[i++] = maxHeap.poll();
return result;
}
}
This is the top K smallest variant: use a max-heap, evict the maximum (farthest point) when overfull.
Example 3 β Top K Frequent Elements
Return the K elements that appear most frequently in the input array.
import java.util.*;
public class TopKFrequent {
public int[] topKFrequent(int[] nums, int k) {
// Step 1: build frequency map
Map<Integer, Integer> freq = new HashMap<>();
for (int num : nums) freq.merge(num, 1, Integer::sum);
// Step 2: min-heap by frequency β root = least-frequent element in current top-K
PriorityQueue<Integer> minHeap = new PriorityQueue<>(
Comparator.comparingInt(freq::get)
);
for (int num : freq.keySet()) {
minHeap.offer(num);
if (minHeap.size() > k) minHeap.poll(); // evict least frequent
}
int[] result = new int[k];
for (int i = k - 1; i >= 0; i--) result[i] = minHeap.poll();
return result;
}
}
The comparator Comparator.comparingInt(freq::get) orders elements by their frequency count, not by value. This is the key technique for any "top K by custom score" variant: build the score map first, then use it inside the comparator.
Example 4 β Kth Largest Element in a Stream
Design a class that returns the Kth largest element after each insertion into a live stream.
import java.util.PriorityQueue;
public class KthLargestStream {
private final PriorityQueue<Integer> minHeap;
private final int k;
public KthLargestStream(int k, int[] initialNums) {
this.k = k;
this.minHeap = new PriorityQueue<>();
for (int num : initialNums) add(num);
}
public int add(int val) {
minHeap.offer(val);
if (minHeap.size() > k) minHeap.poll(); // maintain exactly K elements
return minHeap.peek(); // root = Kth largest at all times
}
}
The heap maintains the K largest elements seen so far. Its root is the smallest of those K β by definition the Kth largest. Each add() costs O(log K), making N additions cost O(N log K) total.
π οΈ Java PriorityQueue and Guava's MinMaxPriorityQueue in Practice
Java's PriorityQueue is the idiomatic choice and sufficient for all four problems above. Two limitations worth knowing:
PriorityQueueis not thread-safe. For concurrent stream processing, usejava.util.concurrent.PriorityBlockingQueueor externally synchronize.remove(Object)is O(N), not O(log N). For workloads that require element removal (not just top/poll), consider alternatives.
Guava's MinMaxPriorityQueue supports O(log N) access to both the minimum and maximum ends. For sliding window "top K" problems where you need efficient eviction from both ends, it is a drop-in replacement:
// Guava dependency: com.google.guava:guava
import com.google.common.collect.MinMaxPriorityQueue;
MinMaxPriorityQueue<Integer> deque = MinMaxPriorityQueue
.maximumSize(k) // automatically evicts the minimum when size exceeds K
.create();
deque.add(5);
deque.add(1);
deque.add(10);
// deque.peekLast() = minimum; deque.peekFirst() = maximum
The maximumSize(k) call is equivalent to the manual if (heap.size() > k) heap.poll() guard, making the code more expressive. For pure algorithm interviews, stick to the standard PriorityQueue; for production streaming pipelines, MinMaxPriorityQueue reduces boilerplate.
π Lessons Learned: Mistakes When Choosing Heap Direction
1. Using the wrong heap direction. This is the #1 interview mistake. For top-K largest, use a min-heap (not max-heap). The min-heap root gives you the weakest element to evict. Using a max-heap of size K would mean you cannot efficiently evict the smallest β you would need to scan all K elements.
2. Forgetting the custom comparator for non-integer types. When comparing points, strings, or custom objects, you must provide a comparator. A common error is comparing int[] arrays with (a, b) -> a[0] - b[0], which only compares the first element. Always write the full multi-field comparator your problem needs.
3. Conflating Kth largest with Kth smallest. "Kth largest in a stream" means the K-th largest overall β the root of a min-heap of size K. "Kth smallest" means the root of a max-heap of size K. Mix them up and you will return the wrong element without any error.
4. Not handling K > array length. When K is larger than the input, return all elements. Guard with k = Math.min(k, nums.length) at the start.
5. Ignoring integer overflow in comparators. (a, b) -> a - b as a comparator can overflow when a is Integer.MIN_VALUE and b is positive. Use Integer.compare(a, b) or Comparator.comparingInt(...) instead.
π Summary: Top K in Five Bullets
- Core idea: A min-heap of size K acts as a rolling leaderboard β every new element challenges the weakest current champion. If it wins, the champion is evicted.
- Time: O(N log K) β optimal when K βͺ N. For K = 10 from N = 50,000,000, this is roughly 50M Γ 3.3 operations = ~170M vs 50M Γ 25 = 1.25B for sort.
- Space: O(K) β the heap holds exactly K elements at all times. Ideal for streaming and memory-constrained environments.
- Direction rule: Top K largest β min-heap of size K. Top K smallest β max-heap of size K.
- When to switch: Use QuickSelect for O(N) average when only the Kth value is needed. Use a full sort when K β N or fully ordered output is required.
π Practice Quiz
Which heap type should you use to find the K smallest elements from a large array?
- A) Min-heap of size K
- B) Max-heap of size K
- C) Min-heap of size N Correct Answer: B β A max-heap of size K keeps the K smallest seen so far. Its root is the largest of those K, and it is evicted whenever a smaller element arrives. A min-heap of size K would evict small elements, keeping the large ones.
You call
new PriorityQueue<>()in Java without a comparator, then use it as the heap for "top K frequent elements." What bug does this introduce?- A) The heap orders elements by value instead of by frequency, so the wrong elements are evicted
- B) The heap becomes a max-heap instead of a min-heap
- C)
offer()throws a ClassCastException at runtime Correct Answer: A β Without a frequency comparator, the heap orders integers by natural value order. It will evict the numerically smallest integer rather than the least-frequent one, producing an incorrect top-K result.
An array has N = 1,000,000 elements and you need the top K = 500,000 elements. Which approach is fastest?
- A) Min-heap of size K: O(N log K) β O(N log N) since K β N/2
- B) Full sort: O(N log N), then slice the top half
- C) QuickSelect to find the median, then partition and return the upper half Correct Answer: C β When K β N/2, QuickSelect partitions in O(N) average to find the Kth element, then the upper-partition pass collects the K largest in O(N) total. Both heap and sort degrade to O(N log N) when K is not small.
Open-ended challenge: The
KthLargestStreamsolution uses a min-heap of fixed size K. If the stream is adversarial β an attacker can choose which numbers to submit to maximize your heap's memory usage β can they cause your solution to use more than O(K) space? How would you harden it? There is no single correct answer.
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Two Pointer Technique: Solving Pair and Partition Problems in O(n)
TLDR: Place one pointer at the start and one at the end of a sorted array. Move them toward each other based on a comparison condition. Every classic pair/partition problem that naively runs in O(nΒ²) collapses to O(n) with this one idea β and masteri...
Tries (Prefix Trees): The Data Structure Behind Autocomplete
TLDR: A Trie stores strings character by character in a tree, so every string sharing a common prefix shares those nodes. Insert and search are O(L) where L is the word length. Tries beat HashMaps on prefix queries β startsWith in O(L) with zero coll...
Sliding Window Technique: From O(nΒ·k) Scans to O(n) in One Pass
TLDR: Instead of recomputing a subarray aggregate from scratch on every shift, maintain it incrementally β add the incoming element, remove the outgoing element. For a fixed window this costs O(1) per shift. For a variable window, expand the right bo...
Merge Intervals Pattern: Solve Scheduling Problems with Sort and Sweep
TLDR: Sort intervals by start time, then sweep left-to-right and merge any interval whose start β€ the current running end. O(n log n) time, O(n) space. One pattern β three interview problems solved. π When Two Meetings Overlap: The Scheduling Prob...
