All Posts

Tree Data Structure Explained: Concepts, Implementation, and Interview Guide

Trees are everywhere. From the DOM in your browser to the file system on your laptop. We explain Binary Trees, BSTs, and Heaps.

Abstract AlgorithmsAbstract Algorithms
Β·Β·15 min read
Cover Image for Tree Data Structure Explained: Concepts, Implementation, and Interview Guide
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: Trees are hierarchical data structures used everywhere β€” file systems, HTML DOM, databases, and search algorithms. Understanding Binary Trees, BSTs, and Heaps gives you efficient $O(\log N)$ search, insertion, and deletion β€” and helps you ace a large class of interview problems.


πŸ“– Trees Are Everywhere: An Org-Chart You Can Query

When you run git diff HEAD~1 HEAD, Git does not compare two flat lists of files. It compares two trees. Every commit in Git is stored internally as a tree of file-system snapshots β€” directories are internal nodes, files are leaves. Git's diff algorithm walks both trees simultaneously, identifies nodes whose content hashes changed, and reports the delta. When Git says "3 files changed," it found three leaf nodes with mismatched hashes.

Trees are not an academic concept. They are the data structure behind version control, file systems, browser DOMs, and database indexes. Here is a simplified picture of what a Git commit tree looks like:

commit (root)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app.py       ← leaf (file snapshot, stored by hash)
β”‚   └── utils.py     ← leaf
└── README.md        ← leaf

Finding the diff between two commits is a tree comparison problem. The traversal algorithms in this post are exactly what power that comparison.

A tree is a hierarchical data structure where nodes connect through parent–child edges. Unlike arrays (indexed by position) or linked lists (a single chain of pointers), trees branch β€” and that branching is what enables O(log n) search, hierarchical queries, and relationship modeling that flat structures cannot support.

Everyday analogy: A company org chart. The CEO is the root, department heads are internal nodes, and individual contributors are leaves. Every employee has exactly one boss (parent), except the CEO β€” who is the one node with no parent.

Key vocabulary:

TermMeaning
RootTopmost node β€” the only node with no parent
ChildA node directly below another node
ParentThe node directly above a child
LeafA node with no children
HeightLength of the longest root-to-leaf path
DepthLength of the path from root to a specific node
SubtreeA node plus all its descendants

πŸ“Š Tree Terminology in a Single Diagram

flowchart TD
    A[Root Node] --> B[Child Left]
    A --> C[Child Right]
    B --> D[Leaf]
    B --> E[Leaf]
    C --> F[Internal Node]
    F --> G[Leaf]
    note1["Height=3, depth varies per node"]

This diagram illustrates the core vocabulary of tree data structures using a five-node example. The top node is the Root, which branches into a left Child and a right Child; the left Child has two Leaf nodes beneath it, while the right Child has a single Internal Node leading to one more Leaf. Tracing from the Root down to the deepest Leaf gives the tree's height (3 in this case), while the depth of any individual node is the number of edges from the Root to that node. Keep this diagram as a reference when reading BST or traversal algorithms β€” every tree operation is described using exactly these terms.


πŸ” Tree Shape and Height: Why Balance Matters

A tree's shape determines how fast every single operation runs. Two BSTs can hold identical values yet differ wildly in performance depending on how those values were inserted.

The key insight: height is everything. For a tree with n nodes, the height dictates the worst-case number of comparisons needed to reach any leaf.

Tree shapeHow it formsHeight (7 nodes)Search complexity
Perfect (fully balanced)Values inserted in shuffled order$\lfloor \log_2 7 \rfloor = 2$$O(\log n)$
Complete (nearly balanced)Values approximately balanced$\approx \log_2 n$$O(\log n)$
Skewed (degenerate)Values inserted in sorted order: 1β†’2β†’3→…$n - 1 = 6$$O(n)$

Why skewed trees happen: inserting values in ascending (or descending) order forces every new node onto the same side. The tree stops being tree-like and becomes a glorified linked list β€” $O(n)$ search, $O(n)$ insert, every time.

The fix: self-balancing BSTs (AVL trees, Red-Black trees) perform rotations after each insert or delete to cap height at $O(\log n)$, regardless of insert order. Java's TreeMap and C++'s std::map both use Red-Black trees internally for exactly this reason.

Rule of thumb: if you control insert order, shuffle it. If you don't, reach for a self-balancing BST.


πŸ”’ Four Essential Tree Types

1. Binary Tree β€” each node has at most two children (left, right). No ordering constraint.

2. Binary Search Tree (BST) β€” a binary tree with the ordering property: left subtree values < parent value < right subtree values. Enables $O(\log n)$ search on balanced trees.

3. Heap β€” a complete binary tree satisfying: every parent β‰₯ its children (max-heap) or ≀ its children (min-heap). Used for priority queues and top-k operations.

4. B-Tree β€” self-balancing search tree optimized for disk I/O; underpins database indexes and file systems. Each node can hold multiple keys and children.

5. Trie (Prefix Tree) β€” each node represents a character; paths from root to nodes spell out strings. Used for autocomplete and spell-checking.

TypeOrdering propertyPrimary use
Binary TreeNoneGeneral hierarchical structure
BSTLeft < Node < RightSearch, sorted iteration
HeapParent β‰₯/≀ ChildrenPriority queue, top-k
B-TreeSorted multi-key nodesDatabase / filesystem indexes
TriePrefix path = string prefixAutocomplete, dictionary search

βš™οΈ BST Operations: Insertion, Search, and Deletion

Insertion

class Node:
    def __init__(self, val):
        self.val = val
        self.left = self.right = None

def insert(root, val):
    if root is None:
        return Node(val)
    if val < root.val:
        root.left = insert(root.left, val)
    elif val > root.val:
        root.right = insert(root.right, val)
    return root
def search(root, target):
    if root is None or root.val == target:
        return root
    if target < root.val:
        return search(root.left, target)
    return search(root.right, target)
graph TD
    A[Root = 5] --> B[Left = 3]
    A --> C[Right = 7]
    B --> D[Left = 2]
    B --> E[Right = 4]
    C --> F[Left = 6]
    C --> G[Right = 8]

For the tree above, searching for 4: β†’ start at 5 β†’ go left to 3 β†’ go right to 4. βœ“

Complexity

OperationBalanced BSTSkewed BST (worst case)
Search$O(\log n)$$O(n)$
Insert$O(\log n)$$O(n)$
Delete$O(\log n)$$O(n)$

A balanced BST (AVL, Red-Black) automatically rebalances after insert/delete to maintain $O(\log n)$ guarantees.

πŸ“Š BST Ordering Property: Left Smaller, Right Larger

flowchart TD
    R[8] --> L[3]
    R --> Ri[10]
    L --> LL[1]
    L --> LR[6]
    Ri --> RR[14]
    LR --> LRL[4]
    LR --> LRR[7]
    label["Left less, Right greater"]

This diagram demonstrates the BST ordering property on a concrete seven-node tree rooted at 8. Every node in the left subtree of any parent is smaller (3, 1, 6, 4, 7 are all less than 8; 1 and 6 are less than 3, and so on), while every node to the right is larger (10 and 14 are greater than 8, and 14 is greater than 10). The label "Left less, Right greater" captures the invariant that must hold at every single node, not just the root. When you search this tree for any value, the ordering property lets you eliminate half the remaining nodes at each step, achieving O(log n) search on a balanced tree.


πŸ“Š Heap Mechanics: Extracting Min or Max in O(log n)

A heap is a complete binary tree stored efficiently as an array. For node at index $i$:

  • Left child: $2i + 1$
  • Right child: $2i + 2$
  • Parent: $\lfloor (i-1)/2 floor$

Min-heap insertion ("bubble up"):

import heapq

h = []
heapq.heappush(h, 10)
heapq.heappush(h, 4)
heapq.heappush(h, 7)

print(heapq.heappop(h))  # β†’ 4 (min element in O(log n))

When to use heaps:

  • Top-k elements from a stream
  • Dijkstra's shortest path (priority queue)
  • Task scheduling by priority

🧠 Deep Dive: BST and Heap Internal Mechanics

A BST maintains its ordering property through binary comparisons at each node. AVL trees re-balance after every insert/delete by tracking node heights and rotating when the height difference exceeds 1. Red-Black trees use a looser color-based invariant for fewer rotations β€” making them faster for write-heavy workloads. Heaps skip pointers entirely: a complete binary tree maps to an array using index arithmetic ($2i+1$, $2i+2$), achieving cache-friendly $O(1)$ min/max peek without any pointer indirection.


🌍 Real-World Applications: Real-World Tree Structures You Use Every Day

ApplicationTree typeHow it is used
File systemN-ary treeDirectories are nodes; files are leaves
HTML/XML DOMN-ary treeBrowser renders the page by traversing the DOM
Database indexesB-Tree / B+-TreeEfficient range and point queries on disk
AutocompleteTriePrefix match in O(m) where m = query length
Git commit historyDAG (tree-like)Each commit has 1–2 parent pointers
Network routingSpanning treeLoop-free paths for broadcast traffic

πŸ“Š Tree Type to Use Case: Matching the Right Tree to the Job

flowchart TD
    A[Tree Type] --> B[BST]
    A --> C[Heap]
    A --> D[Trie]
    A --> E[Segment Tree]
    B --> F[Sorted Data / Search]
    C --> G[Priority Queue]
    D --> H[Autocomplete / Spell]
    E --> I[Range Queries]

This flowchart maps the four main specialised tree variants β€” BST, Heap, Trie, and Segment Tree β€” to their primary use cases. BST connects to sorted data and search; Heap connects to priority queue operations; Trie connects to autocomplete and spell-checking; and Segment Tree connects to range queries. Use this as a starting-point filter: if your problem involves ordering or fast search, reach for a BST; if it involves extracting extremes efficiently, use a Heap; if it involves string prefixes, use a Trie; and if it involves aggregating over index ranges, use a Segment Tree.


βš–οΈ Trade-offs & Failure Modes: Trade-offs and When Trees Go Wrong

Balanced vs. unbalanced: A perfectly balanced BST gives $O(\log n)$ operations. Insert sorted data into a plain BST and it degrades to a linked list β€” $O(n)$ per operation. Use a self-balancing variant (AVL, Red-Black) when input order is unknown.

Memory overhead: Each tree node requires pointers to children. For millions of small nodes, this overhead matters. Arrays (heaps) or B-trees (multi-key nodes) reduce pointer waste.

Failure modeSymptomFix
Skewed BSTO(n) search performanceSwitch to self-balancing BST (AVL, Red-Black)
Deep recursion on traversalStack overflowIterative traversal with explicit stack
Forgetting null checkNullPointerExceptionAlways validate node != null in base case
Wrong heap directionExtracting max instead of minConfirm min-heap vs max-heap property

🧭 Decision Guide: Choosing Between Tree Types

You need...Use
Fast search + sorted iterationBST or balanced BST
Min/max extraction, priority queueBinary heap
Prefix string searchTrie
Disk-efficient sorted indexB-Tree
General hierarchy (no ordering)N-ary tree

🎯 What to Learn Next


πŸ§ͺ Build a BST by Hand

Let's insert [5, 3, 7, 1, 4] into an empty BST one value at a time and verify the result. This sequence was chosen because it produces a balanced tree rather than the degenerate skewed case β€” making it an ideal concrete example for seeing the BST ordering property at work. As you follow each step, focus on the comparison rule at every node (go left if the new value is smaller, go right if it is larger), then confirm the final structure is correct by running an in-order traversal at the end.

Step-by-step insertions:

  1. Insert 5 β†’ becomes the root.
  2. Insert 3 β†’ 3 < 5, place left of 5.
  3. Insert 7 β†’ 7 > 5, place right of 5.
  4. Insert 1 β†’ 1 < 5 (go left to 3) β†’ 1 < 3, place left of 3.
  5. Insert 4 β†’ 4 < 5 (go left to 3) β†’ 4 > 3, place right of 3.

Resulting BST:

        5
       / \
      3   7
     / \
    1   4

Verify the BST property with in-order traversal (Left β†’ Root β†’ Right): 1 β†’ 3 β†’ 4 β†’ 5 β†’ 7 β€” sorted ascending βœ“

In-order traversal of any valid BST always produces sorted output. That is why BSTs underpin sorted maps, ordered sets, and range queries in many standard libraries.

Quick search check: find 4 β†’ start at 5, go left (4 < 5), arrive at 3, go right (4 > 3), found at depth 2. Only 2 comparisons instead of scanning all 5 nodes β€” the BST property pays off immediately even on a tiny tree.


πŸ› οΈ Java TreeMap and PriorityQueue: Self-Balancing BST and Min-Heap Under the Hood

Java's TreeMap and PriorityQueue are the standard library implementations of the BST and heap described in this post. TreeMap is backed by a Red-Black tree (a self-balancing BST) that guarantees O(log n) get, put, remove, floorKey, ceilingKey, and range queries β€” even on sorted-insert workloads. PriorityQueue is a binary min-heap stored as an array, giving O(1) peek and O(log n) insert/extract without any pointer overhead.

Both collections solve the core problem from this post: they handle the self-balancing and heap-property maintenance for you, so you get the correctness guarantees without implementing rotations or heapify logic manually.

import java.util.*;

public class TreeAndHeapDemo {
    public static void main(String[] args) {

        // ── TreeMap: Red-Black tree BST ────────────────────────────────────────
        // Insert in sorted order β€” TreeMap stays balanced (O(log n) per op)
        TreeMap<Integer, String> bst = new TreeMap<>();
        for (int v : new int[]{5, 3, 7, 1, 4, 6, 8}) {
            bst.put(v, "node-" + v);
        }

        // In-order traversal via entrySet() β†’ always sorted ascending
        System.out.println("In-order: " + bst.keySet());
        // β†’ [1, 3, 4, 5, 6, 7, 8]   β€” BST ordering property guaranteed

        // O(log n) floor and ceiling queries
        System.out.println("Floor of 5:   " + bst.floorKey(5));    // β†’ 5
        System.out.println("Ceiling of 5: " + bst.ceilingKey(5));  // β†’ 5
        System.out.println("Floor of 2:   " + bst.floorKey(2));    // β†’ 1
        System.out.println("Ceiling of 2: " + bst.ceilingKey(2));  // β†’ 3

        // Range query: keys between 3 and 6 inclusive β€” O(log n + k)
        NavigableMap<Integer, String> range = bst.subMap(3, true, 6, true);
        System.out.println("Range [3,6]: " + range.keySet()); // β†’ [3, 4, 5, 6]

        // ── PriorityQueue: min-heap (array-backed) ─────────────────────────────
        PriorityQueue<Integer> minHeap = new PriorityQueue<>();
        int[] values = {10, 4, 7, 1, 8};
        for (int v : values) minHeap.offer(v);   // O(log n) each

        System.out.println("\nMin peek (O(1)): " + minHeap.peek());   // β†’ 1

        // Extract all in sorted order β€” equivalent to heap sort
        System.out.print("Extract order: ");
        while (!minHeap.isEmpty()) {
            System.out.print(minHeap.poll() + " "); // O(log n) each
        }
        // β†’ 1 4 7 8 10

        // Max-heap via reversed comparator
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
        for (int v : values) maxHeap.offer(v);
        System.out.println("\nMax peek (O(1)): " + maxHeap.peek());   // β†’ 10

        // Top-3 elements from a stream using a bounded min-heap
        int k = 3;
        PriorityQueue<Integer> topK = new PriorityQueue<>(k); // min-heap of size k
        int[] stream = {5, 12, 3, 8, 19, 1, 7};
        for (int n : stream) {
            topK.offer(n);
            if (topK.size() > k) topK.poll(); // evict smallest
        }
        System.out.println("Top-" + k + " elements: " + topK);
        // β†’ [8, 12, 19]  β€” largest 3 values retained
    }
}

TreeMap automatically rebalances via Red-Black rotations on every insert and delete β€” the O(log n) guarantee holds regardless of insertion order, unlike a plain BST. For a max-heap, pass Collections.reverseOrder() to the PriorityQueue constructor.

For a full deep-dive on Java TreeMap and PriorityQueue internals, a dedicated follow-up post is planned.


πŸ“š Core Principles to Remember

Before moving on, cement three ideas that trip up most beginners.

1. Self-balancing BSTs handle insert order for you. AVL trees and Red-Black trees perform automatic rotations after each insert or delete, keeping height at $O(\log n)$ regardless of the input sequence. When you need an ordered map in production code, reach for the standard library implementation β€” not a home-grown BST.

2. Heaps are not BSTs β€” and they live in arrays. A heap does not support fast arbitrary search (that costs $O(n)$). Its superpower is $O(1)$ peek at the min or max, and $O(\log n)$ extract. Because a heap is a complete binary tree, it fits perfectly in a flat array β€” no pointers needed.

3. Tries trade memory for prefix-search speed. Each character in a string is a node. Searching for a word of length $m$ takes exactly $O(m)$ steps, completely independent of how many words are stored. That is why search engines and autocomplete UIs prefer tries over BSTs for dictionary lookups.

Recognising which tree fits the problem is half the interview answer.


πŸ“Œ TLDR: Summary & Key Takeaways

  • A tree is a hierarchical data structure with a root, internal nodes, and leaves.
  • BSTs enable $O(\log n)$ search and sorted iteration β€” but only when balanced.
  • Heaps give $O(1)$ peek at min/max and $O(\log n)$ insert/extract β€” the foundation of priority queues.
  • Tries enable $O(m)$ prefix search regardless of dictionary size.
  • Self-balancing BSTs (AVL, Red-Black) automatically maintain the $O(\log n)$ guarantee.

πŸ“ Practice Quiz

  1. In a valid BST, what is true about every node?

    • A) All values in the tree are unique
    • B) All values in the left subtree are less than the node; all in the right are greater
    • C) Every node has exactly two children
    • D) The tree must be perfectly balanced

    Correct Answer: B β€” the BST ordering property requires left subtree values < parent < right subtree values, but says nothing about uniqueness, child count, or balance.

  2. Why does inserting sorted data into a plain BST degrade performance?

    • A) Sorted inserts cause the hash function to collide
    • B) Each new node becomes a right child, forming a linked list with O(n) operations
    • C) BSTs cannot store sorted data
    • D) Sorted data must be stored in a heap instead

    Correct Answer: B β€” inserting 1, 2, 3, 4, 5 in order always appends to the rightmost path, turning the tree into a linked list where every search must scan all nodes.

  3. What does a heap guarantee that a BST does not?

    • A) O(log n) search for any element
    • B) O(1) access to the minimum (or maximum) element
    • C) Sorted iteration in O(n)
    • D) No duplicate values

    Correct Answer: B β€” a heap keeps the min (or max) at the root, so peeking costs O(1). A BST can find min/max in O(log n) by traversing to the leftmost (or rightmost) leaf, but not in O(1).



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms