Tree Data Structure Explained: Concepts, Implementation, and Interview Guide
Trees are everywhere. From the DOM in your browser to the file system on your laptop. We explain Binary Trees, BSTs, and Heaps.
Abstract Algorithms
TLDR: Trees are hierarchical data structures used everywhere β file systems, HTML DOM, databases, and search algorithms. Understanding Binary Trees, BSTs, and Heaps gives you efficient $O(\log N)$ search, insertion, and deletion β and helps you ace a large class of interview problems.
π Trees Are Everywhere: An Org-Chart You Can Query
When you run git diff HEAD~1 HEAD, Git does not compare two flat lists of files. It compares two trees. Every commit in Git is stored internally as a tree of file-system snapshots β directories are internal nodes, files are leaves. Git's diff algorithm walks both trees simultaneously, identifies nodes whose content hashes changed, and reports the delta. When Git says "3 files changed," it found three leaf nodes with mismatched hashes.
Trees are not an academic concept. They are the data structure behind version control, file systems, browser DOMs, and database indexes. Here is a simplified picture of what a Git commit tree looks like:
commit (root)
βββ src/
β βββ app.py β leaf (file snapshot, stored by hash)
β βββ utils.py β leaf
βββ README.md β leaf
Finding the diff between two commits is a tree comparison problem. The traversal algorithms in this post are exactly what power that comparison.
A tree is a hierarchical data structure where nodes connect through parentβchild edges. Unlike arrays (indexed by position) or linked lists (a single chain of pointers), trees branch β and that branching is what enables O(log n) search, hierarchical queries, and relationship modeling that flat structures cannot support.
Everyday analogy: A company org chart. The CEO is the root, department heads are internal nodes, and individual contributors are leaves. Every employee has exactly one boss (parent), except the CEO β who is the one node with no parent.
Key vocabulary:
| Term | Meaning |
| Root | Topmost node β the only node with no parent |
| Child | A node directly below another node |
| Parent | The node directly above a child |
| Leaf | A node with no children |
| Height | Length of the longest root-to-leaf path |
| Depth | Length of the path from root to a specific node |
| Subtree | A node plus all its descendants |
π Tree Terminology in a Single Diagram
flowchart TD
A[Root Node] --> B[Child Left]
A --> C[Child Right]
B --> D[Leaf]
B --> E[Leaf]
C --> F[Internal Node]
F --> G[Leaf]
note1["Height=3, depth varies per node"]
This diagram illustrates the core vocabulary of tree data structures using a five-node example. The top node is the Root, which branches into a left Child and a right Child; the left Child has two Leaf nodes beneath it, while the right Child has a single Internal Node leading to one more Leaf. Tracing from the Root down to the deepest Leaf gives the tree's height (3 in this case), while the depth of any individual node is the number of edges from the Root to that node. Keep this diagram as a reference when reading BST or traversal algorithms β every tree operation is described using exactly these terms.
π Tree Shape and Height: Why Balance Matters
A tree's shape determines how fast every single operation runs. Two BSTs can hold identical values yet differ wildly in performance depending on how those values were inserted.
The key insight: height is everything. For a tree with n nodes, the height dictates the worst-case number of comparisons needed to reach any leaf.
| Tree shape | How it forms | Height (7 nodes) | Search complexity |
| Perfect (fully balanced) | Values inserted in shuffled order | $\lfloor \log_2 7 \rfloor = 2$ | $O(\log n)$ |
| Complete (nearly balanced) | Values approximately balanced | $\approx \log_2 n$ | $O(\log n)$ |
| Skewed (degenerate) | Values inserted in sorted order: 1β2β3ββ¦ | $n - 1 = 6$ | $O(n)$ |
Why skewed trees happen: inserting values in ascending (or descending) order forces every new node onto the same side. The tree stops being tree-like and becomes a glorified linked list β $O(n)$ search, $O(n)$ insert, every time.
The fix: self-balancing BSTs (AVL trees, Red-Black trees) perform rotations after each insert or delete to cap height at $O(\log n)$, regardless of insert order. Java's TreeMap and C++'s std::map both use Red-Black trees internally for exactly this reason.
Rule of thumb: if you control insert order, shuffle it. If you don't, reach for a self-balancing BST.
π’ Four Essential Tree Types
1. Binary Tree β each node has at most two children (left, right). No ordering constraint.
2. Binary Search Tree (BST) β a binary tree with the ordering property: left subtree values < parent value < right subtree values. Enables $O(\log n)$ search on balanced trees.
3. Heap β a complete binary tree satisfying: every parent β₯ its children (max-heap) or β€ its children (min-heap). Used for priority queues and top-k operations.
4. B-Tree β self-balancing search tree optimized for disk I/O; underpins database indexes and file systems. Each node can hold multiple keys and children.
5. Trie (Prefix Tree) β each node represents a character; paths from root to nodes spell out strings. Used for autocomplete and spell-checking.
| Type | Ordering property | Primary use |
| Binary Tree | None | General hierarchical structure |
| BST | Left < Node < Right | Search, sorted iteration |
| Heap | Parent β₯/β€ Children | Priority queue, top-k |
| B-Tree | Sorted multi-key nodes | Database / filesystem indexes |
| Trie | Prefix path = string prefix | Autocomplete, dictionary search |
βοΈ BST Operations: Insertion, Search, and Deletion
Insertion
class Node:
def __init__(self, val):
self.val = val
self.left = self.right = None
def insert(root, val):
if root is None:
return Node(val)
if val < root.val:
root.left = insert(root.left, val)
elif val > root.val:
root.right = insert(root.right, val)
return root
Search
def search(root, target):
if root is None or root.val == target:
return root
if target < root.val:
return search(root.left, target)
return search(root.right, target)
graph TD
A[Root = 5] --> B[Left = 3]
A --> C[Right = 7]
B --> D[Left = 2]
B --> E[Right = 4]
C --> F[Left = 6]
C --> G[Right = 8]
For the tree above, searching for 4: β start at 5 β go left to 3 β go right to 4. β
Complexity
| Operation | Balanced BST | Skewed BST (worst case) |
| Search | $O(\log n)$ | $O(n)$ |
| Insert | $O(\log n)$ | $O(n)$ |
| Delete | $O(\log n)$ | $O(n)$ |
A balanced BST (AVL, Red-Black) automatically rebalances after insert/delete to maintain $O(\log n)$ guarantees.
π BST Ordering Property: Left Smaller, Right Larger
flowchart TD
R[8] --> L[3]
R --> Ri[10]
L --> LL[1]
L --> LR[6]
Ri --> RR[14]
LR --> LRL[4]
LR --> LRR[7]
label["Left less, Right greater"]
This diagram demonstrates the BST ordering property on a concrete seven-node tree rooted at 8. Every node in the left subtree of any parent is smaller (3, 1, 6, 4, 7 are all less than 8; 1 and 6 are less than 3, and so on), while every node to the right is larger (10 and 14 are greater than 8, and 14 is greater than 10). The label "Left less, Right greater" captures the invariant that must hold at every single node, not just the root. When you search this tree for any value, the ordering property lets you eliminate half the remaining nodes at each step, achieving O(log n) search on a balanced tree.
π Heap Mechanics: Extracting Min or Max in O(log n)
A heap is a complete binary tree stored efficiently as an array. For node at index $i$:
- Left child: $2i + 1$
- Right child: $2i + 2$
- Parent: $\lfloor (i-1)/2 floor$
Min-heap insertion ("bubble up"):
import heapq
h = []
heapq.heappush(h, 10)
heapq.heappush(h, 4)
heapq.heappush(h, 7)
print(heapq.heappop(h)) # β 4 (min element in O(log n))
When to use heaps:
- Top-k elements from a stream
- Dijkstra's shortest path (priority queue)
- Task scheduling by priority
π§ Deep Dive: BST and Heap Internal Mechanics
A BST maintains its ordering property through binary comparisons at each node. AVL trees re-balance after every insert/delete by tracking node heights and rotating when the height difference exceeds 1. Red-Black trees use a looser color-based invariant for fewer rotations β making them faster for write-heavy workloads. Heaps skip pointers entirely: a complete binary tree maps to an array using index arithmetic ($2i+1$, $2i+2$), achieving cache-friendly $O(1)$ min/max peek without any pointer indirection.
π Real-World Applications: Real-World Tree Structures You Use Every Day
| Application | Tree type | How it is used |
| File system | N-ary tree | Directories are nodes; files are leaves |
| HTML/XML DOM | N-ary tree | Browser renders the page by traversing the DOM |
| Database indexes | B-Tree / B+-Tree | Efficient range and point queries on disk |
| Autocomplete | Trie | Prefix match in O(m) where m = query length |
| Git commit history | DAG (tree-like) | Each commit has 1β2 parent pointers |
| Network routing | Spanning tree | Loop-free paths for broadcast traffic |
π Tree Type to Use Case: Matching the Right Tree to the Job
flowchart TD
A[Tree Type] --> B[BST]
A --> C[Heap]
A --> D[Trie]
A --> E[Segment Tree]
B --> F[Sorted Data / Search]
C --> G[Priority Queue]
D --> H[Autocomplete / Spell]
E --> I[Range Queries]
This flowchart maps the four main specialised tree variants β BST, Heap, Trie, and Segment Tree β to their primary use cases. BST connects to sorted data and search; Heap connects to priority queue operations; Trie connects to autocomplete and spell-checking; and Segment Tree connects to range queries. Use this as a starting-point filter: if your problem involves ordering or fast search, reach for a BST; if it involves extracting extremes efficiently, use a Heap; if it involves string prefixes, use a Trie; and if it involves aggregating over index ranges, use a Segment Tree.
βοΈ Trade-offs & Failure Modes: Trade-offs and When Trees Go Wrong
Balanced vs. unbalanced: A perfectly balanced BST gives $O(\log n)$ operations. Insert sorted data into a plain BST and it degrades to a linked list β $O(n)$ per operation. Use a self-balancing variant (AVL, Red-Black) when input order is unknown.
Memory overhead: Each tree node requires pointers to children. For millions of small nodes, this overhead matters. Arrays (heaps) or B-trees (multi-key nodes) reduce pointer waste.
| Failure mode | Symptom | Fix |
| Skewed BST | O(n) search performance | Switch to self-balancing BST (AVL, Red-Black) |
| Deep recursion on traversal | Stack overflow | Iterative traversal with explicit stack |
| Forgetting null check | NullPointerException | Always validate node != null in base case |
| Wrong heap direction | Extracting max instead of min | Confirm min-heap vs max-heap property |
π§ Decision Guide: Choosing Between Tree Types
| You need... | Use |
| Fast search + sorted iteration | BST or balanced BST |
| Min/max extraction, priority queue | Binary heap |
| Prefix string search | Trie |
| Disk-efficient sorted index | B-Tree |
| General hierarchy (no ordering) | N-ary tree |
π― What to Learn Next
π§ͺ Build a BST by Hand
Let's insert [5, 3, 7, 1, 4] into an empty BST one value at a time and verify the result. This sequence was chosen because it produces a balanced tree rather than the degenerate skewed case β making it an ideal concrete example for seeing the BST ordering property at work. As you follow each step, focus on the comparison rule at every node (go left if the new value is smaller, go right if it is larger), then confirm the final structure is correct by running an in-order traversal at the end.
Step-by-step insertions:
- Insert 5 β becomes the root.
- Insert 3 β 3 < 5, place left of 5.
- Insert 7 β 7 > 5, place right of 5.
- Insert 1 β 1 < 5 (go left to 3) β 1 < 3, place left of 3.
- Insert 4 β 4 < 5 (go left to 3) β 4 > 3, place right of 3.
Resulting BST:
5
/ \
3 7
/ \
1 4
Verify the BST property with in-order traversal (Left β Root β Right):
1 β 3 β 4 β 5 β 7 β sorted ascending β
In-order traversal of any valid BST always produces sorted output. That is why BSTs underpin sorted maps, ordered sets, and range queries in many standard libraries.
Quick search check: find 4 β start at 5, go left (4 < 5), arrive at 3, go right (4 > 3), found at depth 2. Only 2 comparisons instead of scanning all 5 nodes β the BST property pays off immediately even on a tiny tree.
π οΈ Java TreeMap and PriorityQueue: Self-Balancing BST and Min-Heap Under the Hood
Java's TreeMap and PriorityQueue are the standard library implementations of the BST and heap described in this post. TreeMap is backed by a Red-Black tree (a self-balancing BST) that guarantees O(log n) get, put, remove, floorKey, ceilingKey, and range queries β even on sorted-insert workloads. PriorityQueue is a binary min-heap stored as an array, giving O(1) peek and O(log n) insert/extract without any pointer overhead.
Both collections solve the core problem from this post: they handle the self-balancing and heap-property maintenance for you, so you get the correctness guarantees without implementing rotations or heapify logic manually.
import java.util.*;
public class TreeAndHeapDemo {
public static void main(String[] args) {
// ββ TreeMap: Red-Black tree BST ββββββββββββββββββββββββββββββββββββββββ
// Insert in sorted order β TreeMap stays balanced (O(log n) per op)
TreeMap<Integer, String> bst = new TreeMap<>();
for (int v : new int[]{5, 3, 7, 1, 4, 6, 8}) {
bst.put(v, "node-" + v);
}
// In-order traversal via entrySet() β always sorted ascending
System.out.println("In-order: " + bst.keySet());
// β [1, 3, 4, 5, 6, 7, 8] β BST ordering property guaranteed
// O(log n) floor and ceiling queries
System.out.println("Floor of 5: " + bst.floorKey(5)); // β 5
System.out.println("Ceiling of 5: " + bst.ceilingKey(5)); // β 5
System.out.println("Floor of 2: " + bst.floorKey(2)); // β 1
System.out.println("Ceiling of 2: " + bst.ceilingKey(2)); // β 3
// Range query: keys between 3 and 6 inclusive β O(log n + k)
NavigableMap<Integer, String> range = bst.subMap(3, true, 6, true);
System.out.println("Range [3,6]: " + range.keySet()); // β [3, 4, 5, 6]
// ββ PriorityQueue: min-heap (array-backed) βββββββββββββββββββββββββββββ
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
int[] values = {10, 4, 7, 1, 8};
for (int v : values) minHeap.offer(v); // O(log n) each
System.out.println("\nMin peek (O(1)): " + minHeap.peek()); // β 1
// Extract all in sorted order β equivalent to heap sort
System.out.print("Extract order: ");
while (!minHeap.isEmpty()) {
System.out.print(minHeap.poll() + " "); // O(log n) each
}
// β 1 4 7 8 10
// Max-heap via reversed comparator
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
for (int v : values) maxHeap.offer(v);
System.out.println("\nMax peek (O(1)): " + maxHeap.peek()); // β 10
// Top-3 elements from a stream using a bounded min-heap
int k = 3;
PriorityQueue<Integer> topK = new PriorityQueue<>(k); // min-heap of size k
int[] stream = {5, 12, 3, 8, 19, 1, 7};
for (int n : stream) {
topK.offer(n);
if (topK.size() > k) topK.poll(); // evict smallest
}
System.out.println("Top-" + k + " elements: " + topK);
// β [8, 12, 19] β largest 3 values retained
}
}
TreeMap automatically rebalances via Red-Black rotations on every insert and delete β the O(log n) guarantee holds regardless of insertion order, unlike a plain BST. For a max-heap, pass Collections.reverseOrder() to the PriorityQueue constructor.
For a full deep-dive on Java TreeMap and PriorityQueue internals, a dedicated follow-up post is planned.
π Core Principles to Remember
Before moving on, cement three ideas that trip up most beginners.
1. Self-balancing BSTs handle insert order for you. AVL trees and Red-Black trees perform automatic rotations after each insert or delete, keeping height at $O(\log n)$ regardless of the input sequence. When you need an ordered map in production code, reach for the standard library implementation β not a home-grown BST.
2. Heaps are not BSTs β and they live in arrays. A heap does not support fast arbitrary search (that costs $O(n)$). Its superpower is $O(1)$ peek at the min or max, and $O(\log n)$ extract. Because a heap is a complete binary tree, it fits perfectly in a flat array β no pointers needed.
3. Tries trade memory for prefix-search speed. Each character in a string is a node. Searching for a word of length $m$ takes exactly $O(m)$ steps, completely independent of how many words are stored. That is why search engines and autocomplete UIs prefer tries over BSTs for dictionary lookups.
Recognising which tree fits the problem is half the interview answer.
π TLDR: Summary & Key Takeaways
- A tree is a hierarchical data structure with a root, internal nodes, and leaves.
- BSTs enable $O(\log n)$ search and sorted iteration β but only when balanced.
- Heaps give $O(1)$ peek at min/max and $O(\log n)$ insert/extract β the foundation of priority queues.
- Tries enable $O(m)$ prefix search regardless of dictionary size.
- Self-balancing BSTs (AVL, Red-Black) automatically maintain the $O(\log n)$ guarantee.
π Practice Quiz
In a valid BST, what is true about every node?
- A) All values in the tree are unique
- B) All values in the left subtree are less than the node; all in the right are greater
- C) Every node has exactly two children
- D) The tree must be perfectly balanced
Correct Answer: B β the BST ordering property requires left subtree values < parent < right subtree values, but says nothing about uniqueness, child count, or balance.
Why does inserting sorted data into a plain BST degrade performance?
- A) Sorted inserts cause the hash function to collide
- B) Each new node becomes a right child, forming a linked list with O(n) operations
- C) BSTs cannot store sorted data
- D) Sorted data must be stored in a heap instead
Correct Answer: B β inserting 1, 2, 3, 4, 5 in order always appends to the rightmost path, turning the tree into a linked list where every search must scan all nodes.
What does a heap guarantee that a BST does not?
- A) O(log n) search for any element
- B) O(1) access to the minimum (or maximum) element
- C) Sorted iteration in O(n)
- D) No duplicate values
Correct Answer: B β a heap keeps the min (or max) at the root, so peeking costs O(1). A BST can find min/max in O(log n) by traversing to the leftmost (or rightmost) leaf, but not in O(1).
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions β with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose β range, hash, consistent hashing, or directory β determines whether range queries stay ch...
