The Ultimate Data Structures Cheat Sheet
A practical cheat sheet for choosing the right data structure based on access pattern, scale, and performance goals.
Abstract Algorithms
TLDR: Data structures are tools. Picking the right one depends on what operation you do most: lookup, insert, delete, ordered traversal, top-k, prefix search, or graph navigation. Start from operation frequency, not from habit.
๐ Why Structure Choice Beats Algorithm Cleverness
A senior engineer traced a 400ms latency spike on Airbnb's search service to a single line of code: a membership check against a Python list. The service was calling if item in results inside a hot path, and with tens of thousands of items, each check was O(n) โ scanning every element in the list. Replacing the list with a set dropped the check to O(1) and cut latency by 400ms overnight. Same data. Different structure. Night-and-day performance.
# Before: O(n) per lookup โ scans every element in the list
if user_id in seen_list: # seen_list is a Python list
continue
# After: O(1) per lookup โ hash set resolves in constant time
if user_id in seen_set: # seen_set is a Python set (hash-backed)
continue
This is not an unusual story. It happens in every codebase that grows beyond a few hundred items and was built by habit rather than intent.
A data structure is a way to organize data so that common operations are efficient and predictable. Most performance bugs come from picking the wrong structure โ not from writing wrong syntax. The structure drives the algorithm, and the algorithm drives the cost.
Think of a workshop: a hammer for nails, a screwdriver for screws, a wrench for bolts. All are "tools," but each is optimal for one specific job. Data structures work exactly the same way.
| Structure | Great at | Weak at |
| Array / List | Indexed access, cache-friendly iteration | Middle inserts/deletes |
| Hash map | O(1) key lookup | Ordered iteration |
| Stack | Undo/backtracking | Random access |
| Queue | FIFO workflows | Middle operations |
| Heap | Min/max extraction, top-k | Full sorted output |
| Balanced BST | Ordered queries, range scans | Constant-factor overhead |
| Trie | Prefix search, autocomplete | Memory overhead per node |
| Graph (adj. list) | Sparse graph traversal | Dense matrix-style ops |
๐ Choosing the Right Data Structure by Operation Type
flowchart TD
A[Choose Data Structure] --> B{Key Operation?}
B -- O1 Lookup by Key --> C[HashMap / HashSet]
B -- Sorted Order --> D{Need Balance?}
B -- LIFO --> E[Stack]
B -- FIFO --> F[Queue]
B -- Fast Insert/Delete --> G[Linked List]
D -- Yes --> H[AVL / Red-Black BST]
D -- No --> I[Binary Search Tree]
This decision flowchart helps you choose the right data structure by starting with one question: what is the key operation? The five branches lead to the structures optimised for each operation โ HashMap/HashSet for O(1) lookup, Stack for LIFO, Queue for FIFO, Linked List for fast insert/delete, and a further split between a self-balancing BST and a plain BST for sorted-order needs. Use this diagram as a quick filter before consulting the Big-O table: identify your dominant operation, follow its branch, and you have a concrete shortlist to evaluate.
๐ Linear vs. Non-Linear: How Data Shapes Access Patterns
Every data structure belongs to one of two broad families, and recognising which family you are in is the first filter before consulting any Big-O table.
Linear structures hold elements in a sequence. Each item has a clear predecessor and successor, so traversal is straightforward โ but sequential organisation means some operations are inherently O(n). Arrays, linked lists, stacks, and queues all live here.
Non-linear structures โ trees, heaps, tries, graphs โ allow branching or networked relationships. That branching is what enables logarithmic or even constant search depth, hierarchical queries, and connection modelling.
| Category | Examples | Access pattern | Best use case |
| Linear โ indexed | Array, dynamic array | O(1) by index | Iteration, positional reads |
| Linear โ linked | Linked list | O(n) sequential | Streaming, pointer-heavy inserts |
| Linear โ restricted ends | Stack, Queue, Deque | O(1) at ends only | Undo stacks, FIFO task queues |
| Non-linear โ hierarchical | BST, Heap, AVL tree | O(log n) typical | Sorted data, priority scheduling |
| Non-linear โ keyed | Hash map | O(1) avg by key | Lookup tables, counters, caches |
| Non-linear โ prefix | Trie | O(m) key length | Autocomplete, prefix matching |
| Non-linear โ relational | Graph (adjacency list) | O(V+E) traversal | Networks, dependency graphs |
Rule of thumb: if your data is inherently relational or hierarchical, reach for a non-linear structure first. Linear structures shine when index position or arrival order is the primary concern.
๐ Linear vs. Non-Linear: The Two Families of Data Structures
flowchart TD
A[Data Structures] --> B[Linear]
A --> C[Non-Linear]
B --> D[Array]
B --> E[Linked List]
B --> F[Stack]
B --> G[Queue]
C --> H[Tree]
C --> I[Graph]
C --> J[Heap]
C --> K[Hash Table]
This diagram organises all major data structures into two parent categories โ Linear and Non-Linear โ reflecting a fundamental difference in how they arrange and access data. Linear structures (Array, Linked List, Stack, Queue) impose a sequence order where each element has a clear predecessor and successor; Non-Linear structures (Tree, Graph, Heap, Hash Table) support branching, hierarchical, or key-based relationships. Use this as a first filter: if your problem involves position or arrival order, stay in the Linear branch; if it involves relationships, priorities, or direct key lookup, move to the Non-Linear branch.
๐ข Big-O Reference: The Foundation of Every Choice
| Data structure | Search | Insert | Delete | Notes |
| Array (unsorted) | O(n) | O(1) append / O(n) middle | O(n) | Best cache locality |
| Sorted array | O(log n) binary search | O(n) | O(n) | Great reads, expensive writes |
| Linked list | O(n) | O(1) at known pointer | O(1) at known pointer | Poor cache; pointer overhead |
| Hash map | O(1) avg | O(1) avg | O(1) avg | No natural ordering |
| Balanced BST (AVL/RB) | O(log n) | O(log n) | O(log n) | Ordered keys, range queries |
| Binary heap | O(n) search | O(log n) | O(log n) root | O(1) peek at min/max |
| Trie | O(m) key length | O(m) | O(m) | Excellent prefix operations |
๐ Operation Cost Groupings: O(1), O(log N), and O(N)
flowchart LR
subgraph O1
A[Array Access]
B[HashMap Get]
C[Stack Push/Pop]
end
subgraph OlogN
D[BST Search]
E[Binary Search]
F[Heap Insert]
end
subgraph ON
G[Array Search]
H[Linked List]
I[Tree Traversal]
end
This diagram groups common data structure operations into three cost tiers โ O(1), O(log N), and O(N) โ making it easy to see which tier a given operation belongs to. The O(1) group holds array index access, HashMap get, and stack push/pop; the O(log N) group holds BST search, binary search, and heap insert; and the O(N) group holds linear array search, linked list traversal, and full tree traversal. When evaluating a structure for a hot-path operation, identify its tier here first โ a single tier crossing (from O(log N) to O(N) under high call frequency) is typically the root cause of performance regressions.
โ๏ธ The Structure-Selection Workflow
Follow this process before picking any data structure:
flowchart TD
A[Define Problem] --> B[List Critical Operations]
B --> C[Estimate Operation Frequency]
C --> D[Shortlist Candidate Structures]
D --> E[Check Constraints: ordering, memory, concurrency]
E --> F[Prototype with sample workload]
F --> G[Choose and document trade-offs]
The four questions to ask:
- What do I do most: read, write, search, sort, or traverse?
- Do I need ordering guarantees?
- Is memory tight?
- Is access by index, key, prefix, or relationship?
Start from operation frequency, not from familiarity.
๐ง Deep Dive: How Hash Maps Handle Collisions
A hash map converts each key to a bucket index via a hash function. When two keys hash to the same bucket โ a collision โ most implementations use separate chaining (a linked list per bucket) or open addressing (probe to the next empty slot). Performance degrades as the load factor (entries รท buckets) rises; most implementations rehash at ~0.75, copying all entries into a larger array.
| Strategy | Collision resolution | Trade-off |
| Separate chaining | Linked list per bucket | Simple; pointer overhead per entry |
| Open addressing | Probe to next empty slot | Cache-friendly; clusters at high load factors |
๐ Real-World Applications: Problem-to-Structure Mapping in Practice
| Problem | Good starting choice | Why |
| Count frequencies | Hash map | O(1) key lookup and update |
| Keep smallest/largest live set | Min/max heap | O(1) peek, O(log n) insert |
| Process tasks in arrival order | Queue (deque) | FIFO with O(1) enqueue/dequeue |
| Undo/redo behavior | Stack | LIFO naturally models edit history |
| Autocomplete by prefix | Trie | O(m) prefix query regardless of dictionary size |
| Range query on sorted keys | Balanced BST / SortedList | O(log n) floor/ceiling/range |
| Sparse graph traversal | Adjacency list (hash map) | Memory-efficient for low-edge graphs |
Quick diagnostic: if you're fighting your data structure, you chose the wrong one. An O(n) search on a list you query millions of times should become a hash map.
๐ Hidden Costs Beginners Miss
The Big-O table shows asymptotic behavior โ but real performance includes constant factors, cache misses, and allocation overhead.
| Structure | Hidden cost | When it matters |
| Hash map | Rehash pause at load-factor threshold | Write bursts that grow the map rapidly |
| Linked list | Pointer chasing โ CPU cache misses | High-frequency traversal of large lists |
| Balanced BST | Per-node allocation overhead | Millions of small nodes |
| Heap | Cannot cheaply iterate sorted output | Every-element-sorted use cases |
| Trie | Many nodes for sparse alphabets | Unicode or long-key spaces |
# Pre-size a dict to avoid rehash pauses
from sys import getsizeof
d = {}
d.update({i: i for i in range(10_000)}) # triggers multiple rehashes
# Better: pre-size if you know the approximate count
d = dict.fromkeys(range(10_000)) # allocates once
โ๏ธ Trade-offs & Failure Modes: Production Constraints That Override Big-O
| Constraint | Typical risk | Guardrail |
| High write burst | Latency spike from resize/rehash | Pre-size structures at initialization |
| Tight memory budget | OOM at scale | Estimate per-entry overhead (pointer + header) |
| Need deterministic output order | Unstable results from hash map | Use an ordered map or sorted list |
| Concurrency | Race conditions on shared structure | Thread-safe wrappers, ConcurrentHashMap, or locks |
| Very large key space | Hash collision clustering | Robust hash function + load testing |
Measure p95/p99 latency, not just average. Hash maps spike at resize; trees spike at rebalance. Synthetic benchmarks with adversarial inputs surface these before production does.
๐งญ Decision Guide: Structure Decision Guide at a Glance
| You need... | Start with |
| Key-value lookup | Hash map |
| Sorted keys and range queries | Balanced BST (TreeMap / SortedList) |
| Continuously extracting min/max | Heap |
| FIFO event queue | Queue / deque |
| LIFO backtracking | Stack |
| Prefix search | Trie |
| Relationship traversal | Adjacency list (graph) |
| Sorted iteration + O(1) key access | Hash map + sorted list (dual structure) |
๐ฏ What to Learn Next
๐งช Spot the Wrong Structure
The most common cause of sluggish code is not a bad algorithm โ it is a mismatched data structure. Here are four high-frequency selection mistakes and the fix for each.
| Scenario | Wrong choice | Why it hurts | Right choice |
| Checking whether a value exists millions of times | list / array | O(n) scan per check | Hash set โ O(1) avg lookup |
| Extracting the minimum item repeatedly from a live set | Sorted list | O(n) insert to maintain order | Min-heap โ O(log n) insert, O(1) peek |
| Undo/redo history | Queue (FIFO) | Wrong ordering โ newest action must be undone first | Stack (LIFO) โ natural fit |
| Autocomplete from a dictionary of 500 000 words | Linear scan or binary search | O(n) or O(log n) per prefix query | Trie โ O(m) where m is the prefix length |
Before-and-after gut check: if you find yourself scanning a list inside a hot loop just to test membership, that list should be a set. If you are sorting a collection just to peek at the smallest element, that collection should be a heap.
The structure mismatch pattern is always the same: a habit-driven choice (list, array) is used for an operation it was never optimised for (lookup, priority extraction). Recognising that mismatch early โ before the data grows large โ is the skill worth practising.
๐ ๏ธ Java Collections Framework: HashMap, ArrayList, TreeMap, and PriorityQueue in Action
The Java Collections Framework (JCF) is the standard library implementation of every structure in this cheat sheet โ HashMap for O(1) key lookup, ArrayList for indexed arrays, TreeMap for ordered BST-backed maps, and PriorityQueue for min-heap operations. It ships with every JDK, requires zero dependencies, and is the reference every Java interview expects you to know.
JCF directly solves the problem this post describes: matching the data structure to the dominant operation. Each collection makes the right operations fast and exposes the same clean Collection/Map interface so you can swap one for another with minimal code changes.
import java.util.*;
public class CollectionsDemoSheet {
public static void main(String[] args) {
// โโ HashMap: O(1) average key lookup โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Map<String, Integer> wordCount = new HashMap<>();
for (String word : new String[]{"apple","banana","apple","cherry","banana","apple"}) {
wordCount.merge(word, 1, Integer::sum);
}
System.out.println("Counts: " + wordCount);
// โ {apple=3, banana=2, cherry=1} โ O(1) insert + lookup per word
// โโ ArrayList: O(1) indexed access, O(n) middle insert โโโโโโโโโโโโโโโโ
List<Integer> scores = new ArrayList<>(List.of(95, 82, 77, 90, 88));
Collections.sort(scores, Collections.reverseOrder());
System.out.println("Top scores: " + scores);
// โ [95, 90, 88, 82, 77]
// โโ TreeMap: sorted BST โ O(log n) floor/ceiling/range queries โโโโโโโโโ
TreeMap<Integer, String> leaderboard = new TreeMap<>();
leaderboard.put(100, "Alice");
leaderboard.put(85, "Bob");
leaderboard.put(92, "Carol");
leaderboard.put(78, "Dave");
System.out.println("Top player: " + leaderboard.lastEntry());
// โ 100=Alice (O(log n) max)
System.out.println("Scores 85-100: " + leaderboard.subMap(85, true, 100, true));
// โ {85=Bob, 92=Carol, 100=Alice} โ O(log n + k) range query
// โโ PriorityQueue (min-heap): O(1) peek, O(log n) insert/extract โโโโโโ
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
int[] stream = {42, 17, 99, 8, 56, 3};
for (int n : stream) minHeap.offer(n); // O(log n) each
System.out.print("Min extracted: ");
while (!minHeap.isEmpty()) {
System.out.print(minHeap.poll() + " "); // O(log n) each
}
// โ 3 8 17 42 56 99 (sorted ascending โ heap sort!)
// โโ LinkedList as a Deque (stack + queue) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Deque<String> taskQueue = new ArrayDeque<>(); // prefer ArrayDeque over LinkedList
taskQueue.offer("task-A");
taskQueue.offer("task-B");
taskQueue.offer("task-C");
System.out.println("\nFIFO dequeue: " + taskQueue.poll()); // โ task-A
}
}
ArrayDeque is preferred over LinkedList for queue/stack use in Java: it avoids pointer overhead and is cache-friendlier. TreeMap is backed by a Red-Black tree โ the self-balancing BST mentioned in the decision guide.
For a full deep-dive on Java Collections Framework internals and performance, a dedicated follow-up post is planned.
๐ Selection Principles to Carry Forward
Three principles cover the majority of structure-selection decisions you will face:
1. Let the dominant operation decide. Identify the single most frequent operation โ lookup, insert, delete, min/max extraction, prefix query, traversal โ and choose the structure that makes that operation cheapest. Every other consideration is secondary.
2. Big-O is necessary, not sufficient. Asymptotic complexity is your starting filter, but constant factors, memory layout, cache behaviour, and allocation overhead determine real performance. A linked list with O(1) pointer-inserts can be slower in practice than an O(n) array copy, because the array copy benefits from CPU cache prefetching.
3. Measure before you optimise. Profile with realistic data shapes and sizes. Synthetic micro-benchmarks can lie. A hash map that looks fast under uniform keys may spike under skewed key distributions. Test with adversarial inputs before trusting your structure choice at scale.
These principles apply at every level, from a 50-line script to a distributed service handling millions of operations per second.
๐ TLDR: Summary & Key Takeaways
- Start from operation frequency, not familiarity โ the structure that fits your hottest operation wins.
- Big-O is necessary but not sufficient; constant factors, cache behavior, and allocation overhead matter in production.
- Pre-size hash maps and arrays when you know approximate counts to avoid pause-inducing resizes.
- Validate worst-case behavior with adversarial inputs before shipping data-structure-dependent code.
- Document structure invariants when you combine multiple structures in one system.
๐ Practice Quiz
Your system sorts a list and binary-searches it millions of times per second, but inserts are rare. Is this the right structure?
- A) No โ hash map is always faster
- B) Yes โ sorted array + binary search is O(log n) read with minimal write cost when inserts are rare
- C) No โ you should always use a heap
- D) No โ a balanced BST is always the better choice for sorted data
Correct Answer: B โ a sorted array paired with binary search delivers O(log n) reads and only pays an O(n) insert cost on the rare writes, making it the right fit when reads vastly outnumber writes.
Why can a hash map cause latency spikes in a write-heavy workload?
- A) Hash maps do not support concurrent writes
- B) Rehashing at load-factor threshold requires copying all entries, causing a pause
- C) Hash maps have O(n) insert complexity
- D) Hash maps allocate one extra pointer per entry on every write
Correct Answer: B โ when the load factor threshold is crossed, the map rehashes all existing entries into a larger backing array, causing a brief but measurable pause that shows up as a p99 latency spike.
When is an adjacency list preferred over an adjacency matrix for a graph?
- A) When the graph is dense (many edges per node)
- B) When you need O(1) edge-existence checks
- C) When the graph is sparse โ the list uses O(V+E) memory vs O(Vยฒ)
- D) When nodes have no outgoing edges
Correct Answer: C โ adjacency lists store only the edges that exist, so memory scales with O(V+E) rather than O(Vยฒ), making them the default choice for real-world sparse graphs such as social networks or road maps.
๐ Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions โ with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy โ but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose โ range, hash, consistent hashing, or directory โ determines whether range queries stay ch...
