Home/Blog/Algorithms/What are Hash Tables? Basics Explained

AlgorithmsIntermediate•12 min read•Mar 9, 2026

What are Hash Tables? Basics Explained

Hash Tables are the magic behind O(1) lookups. We explain Hash Functions, Collisions, and why the...

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: A hash table gives you near-O(1) lookups, inserts, and deletes by using a hash function to map keys to array indices. The tradeoff: collisions (when two keys hash to the same slot) must be handled, and a full hash table must be resized.

📖 The Magic Mailroom Stamp

Imagine a mailroom with 100 drawers. A magic stamp converts any employee name to a number 1–100. Want John's mail? Stamp "John" → you get 42 → open Drawer 42. No search needed.

That stamp is a hash function. The row of drawers is the array (the backing store). The combination is a hash table.

🔍 Hash Table Fundamentals

A hash table combines two simple ideas into a data structure with near-constant-time performance:

An array provides O(1) access by index: array[42] is instant.
A hash function converts any key (string, integer, object) to an index: h("john") → 42.

Together: given any key, compute its index in one step, then access the array slot directly. No iteration. No comparison loop.

The three fundamental operations and their complexity:

Operation	Average case	Worst case	Why worst case occurs
Lookup	O(1)	O(N)	All keys hash to same slot
Insert	O(1) amortized	O(N)	Rehash triggered on full table
Delete	O(1)	O(N)	All keys collide into same bucket

Worst case requires a pathologically bad hash function (or deliberate hash flooding). In practice, well-designed hash functions achieve average O(1) for all three operations.

What makes a good hash function?

Deterministic: Same key always maps to the same slot.
Uniform: Keys distribute evenly across all slots to minimize collisions.
Fast: Must compute in O(1) — the hash computation cannot become the bottleneck.
Avalanche effect: Similar keys produce very different hash values (prevents clustering).

🔢 Deep Dive: Array + Hash Function

flowchart LR
    Key[Key: 'cat'] --> HashFn["Hash Function h('cat') = 2"]
    HashFn --> Array[Array [0][1][2: 'catmeow'][3]...]
    Array --> Value[Value: 'meow']

A hash function must be:

Deterministic: Same input always produces the same output.
Fast: O(1) computation.
Uniform: Keys distribute evenly to minimize collisions.

Simple hash function (NOT production-quality):

def simple_hash(key: str, size: int) -> int:
    return sum(ord(c) for c in key) % size

Python's built-in hash() uses SipHash (cryptographically secure version) to prevent hash flooding attacks.

📊 How a Hash Table Handles a Lookup

The lookup flow for key "alice" in a hash table with separate chaining:

flowchart TD
    K[Key: 'alice'] --> HF["Hash Function: h('alice') = 7"]
    HF --> IDX[Array index 7]
    IDX --> BK[Bucket 7: [('alice', 25), ('charlie', 30)]]
    BK --> SCAN{Scan bucket: key == 'alice'?}
    SCAN -->|Match found| RET[Return value: 25]
    SCAN -->|No match| MISS[Return None / KeyError]
    style RET fill:#90EE90
    style MISS fill:#FFB6C1

Three scenarios during lookup:

Direct hit: array[index] contains exactly the target key → O(1).
Collision chain: array[index] has multiple entries → scan the short chain → O(k) where k = bucket length (typically 1–2 with good load factor).
Miss: The slot is empty → key not in table → O(1).

For a table with load factor 0.75 and a good hash function, average bucket length is < 2, making even chained lookups extremely fast in practice.

⚙️ Collisions: When Two Keys Land in the Same Slot

No hash function is perfect. Given a finite array, two different keys will eventually map to the same slot.

Example:

hash('cat') = 2 → Stored at index 2.
hash('bird') = 2 → Collision at index 2!

Two collision resolution strategies:

Separate Chaining

Each slot holds a linked list of key-value pairs. On collision, append to the list.

index 2: [('cat', 'meow') → ('bird', 'tweet')]

O(1) average, O(n) worst case if all keys hash to the same slot.

Open Addressing (Linear Probing)

No linked lists. On collision, step forward until you find an empty slot.

hash('cat') = 2 → store at 2
hash('bird') = 2 (full) → probe 3 → store at 3

Pros: Better cache performance (all data in one array). Cons: Clustering — many consecutive full slots slow probing.

Strategy	Space	Lookup (avg)	Cache Friendly	Used In
Separate Chaining	Extra memory for lists	O(1)	No	Java HashMap (until JDK 8)
Linear Probing	Compact array	O(1)	Yes	Python dict, Java HashMap (JDK 8+)
Quadratic Probing	Compact array	O(1)	Partial	Open-addressing tables
Robin Hood Hashing	Compact array	O(1)	Yes	Rust HashMap

📊 Open Addressing: Collision Probe Sequence

sequenceDiagram
    participant K as Key "bird"
    participant HF as Hash Function
    participant AR as Array

    K->>HF: h(bird) = 2
    HF->>AR: check slot 2
    AR-->>HF: slot 2 occupied by "cat"
    HF->>AR: probe slot 3 (linear)
    AR-->>HF: slot 3 empty
    HF-->>K: insert "bird" at slot 3
    Note over AR: No linked list needed

This diagram traces a collision in open addressing step by step. When "bird" hashes to slot 2 — already occupied by "cat" — the algorithm probes forward to slot 3 instead of allocating a separate linked list. The key takeaway is that open addressing keeps all data in a single contiguous array, which is friendlier to CPU cache lines but requires careful load-factor management to prevent long probe chains.

🧠 Deep Dive: Load Factor and Rehashing

Load factor = (number of entries) / (array capacity).

As the table fills up (load factor → 1.0), collisions increase and performance degrades.

Standard practice: rehash when load factor exceeds 0.75.

Rehashing:

Create a new array 2× the size.
Re-insert every existing key into the new array (re-hash all keys).
Old array is garbage collected.

Cost: O(N) for one rehash. But since it happens at doubly-spaced intervals, the amortized cost per insert is O(1).

📊 Load Factor Decision: Keep or Resize

flowchart TD
    A[Insert new key] --> B{Load factor > 0.75?}
    B -- No --> C[Insert into current array]
    C --> D["O(1) average"]
    B -- Yes --> E[Allocate 2x array]
    E --> F[Re-hash all existing keys]
    F --> G[Insert new key]
    G --> H[Old array GC'd]
    H --> I["Amortized O(1) per insert"]

This flowchart captures the split-second decision Kotlin and Java runtimes make on every insert. Below the 0.75 threshold the operation is pure O(1); above it, the table allocates a new backing array and rehashes every existing key — an O(N) operation that happens infrequently enough that amortized cost remains O(1). Sizing your hash table upfront with new HashMap<>(expectedSize, 0.75f) avoids this rehash entirely in batch-loading scenarios.

⚙️ Why Python Dicts Are Fast

Python dict uses open addressing with compact arrays (since Python 3.6). Entries are stored in an incrementally-growing compact array indexed by insertion order, with a separate hash table of indices pointing into it. This gives:

Fast iteration (compact memory, no pointer chasing).
Memory-efficient (no linked list overhead).
Dict guaranteed ordered by insertion order.

⚖️ Trade-offs & Failure Modes: Trade-offs, Failure Modes & Decision Guide: Hash Tables vs Others

Operation	Hash Table	BST (e.g., TreeMap)	Array (sorted)
Lookup by key	O(1) avg	O(log n)	O(log n) binary search
Insert	O(1) avg	O(log n)	O(n) shift
Delete	O(1) avg	O(log n)	O(n) shift
Range query	❌ Not supported	✅ O(log n + k)	✅ O(log n + k)
Ordered iteration	❌ No order	✅ In-order	✅ Sorted

Choose hash table when: You need fast key-value lookup and don't need ordering or range queries. Choose BST when: You need sorted iteration or range queries (e.g., "find all keys between 10 and 50").

🌍 Real-World Application: Hash Tables in Production

Hash tables are one of the most widely used data structures in existence:

Application	Hash table used	Why
Programming language runtimes	Python `dict`, JavaScript object, Java `HashMap`	Core key-value primitive for all dynamic lookups
Database query execution	Hash join, hash aggregate	Join two tables by hashing the join key for O(1) match
Caches (Redis, Memcached)	Hash table over key-value pairs	O(1) cache hit lookup by key
DNS resolution	Cached DNS records	Map hostname → IP in O(1) after first resolution
Compiler symbol tables	Variable name → type/location	O(1) lookup during type-checking and code generation
Counting frequencies	Word count, histogram	`count[word] += 1` pattern — O(1) per word
Deduplication	Seen-set during ETL	O(1) check if record was already processed

In coding interviews, hash tables solve a huge class of problems in O(N):

Two Sum / Four Sum (O(N) with hash lookup vs O(N²) brute force)
Anagram detection (frequency map comparison)
Longest subarray with constraint (sliding window + hash)
Graph adjacency representation (adjacency list as hash map)

🧪 Practical: Building a Frequency Counter

The most common hash table interview pattern: count occurrences and use them to answer queries.

Problem: Given two strings, determine if one is an anagram of the other in O(N) time.

def is_anagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False

    freq = {}

    # Count characters in s
    for char in s:
        freq[char] = freq.get(char, 0) + 1

    # Subtract characters in t
    for char in t:
        if char not in freq or freq[char] == 0:
            return False
        freq[char] -= 1

    return True

# Examples
print(is_anagram("anagram", "nagaram"))  # True
print(is_anagram("rat", "car"))          # False

Why this is O(N): Building the frequency map is one pass over s (O(N)). Validating against t is one pass over t (O(N)). Total: O(2N) = O(N). No sorting required (sorting would be O(N log N)).

Python shortcut:

from collections import Counter

def is_anagram_v2(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

collections.Counter is Python's built-in frequency hash table — a dict subclass with the same O(1) insert and lookup.

🛠️ Java HashMap & ConcurrentHashMap: Hash Table Mechanics Under the Hood

Java's HashMap is a production-quality hash table that uses separate chaining internally (linked list that upgrades to a red-black tree when a bucket reaches depth ≥ 8) and automatically rehashes at a 0.75 load factor — exactly the mechanics described in the collision resolution and rehashing sections above.

ConcurrentHashMap extends the same design for multithreaded environments: instead of locking the entire map, it uses CAS operations on individual bucket heads, allowing many concurrent writers with no lock contention on separate buckets.

import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class HashTableDemo {

    // ── Custom hash table with separate chaining (illustrative) ──────────────
    static class ChainedHashTable<K, V> {
        private static final int SIZE = 16;
        @SuppressWarnings("unchecked")
        private final java.util.LinkedList<Map.Entry<K,V>>[] buckets =
            new java.util.LinkedList[SIZE];

        public void put(K key, V value) {
            int idx = Math.abs(key.hashCode() % SIZE);
            if (buckets[idx] == null) buckets[idx] = new java.util.LinkedList<>();
            buckets[idx].removeIf(e -> e.getKey().equals(key)); // update existing
            buckets[idx].add(Map.entry(key, value));
        }

        public V get(K key) {
            int idx = Math.abs(key.hashCode() % SIZE);
            if (buckets[idx] == null) return null;
            return buckets[idx].stream()
                               .filter(e -> e.getKey().equals(key))
                               .map(Map.Entry::getValue)
                               .findFirst().orElse(null);
        }
    }

    public static void main(String[] args) {
        // ── HashMap: O(1) average — word frequency counter (mirrors 🧪 section)
        Map<String, Integer> freq = new HashMap<>();
        for (String word : new String[]{"apple","banana","apple","cherry","apple"}) {
            freq.merge(word, 1, Integer::sum);  // merge is atomic put-or-update
        }
        System.out.println(freq);  // {apple=3, banana=1, cherry=1}

        // ── ConcurrentHashMap: thread-safe, no full-map lock ──────────────────
        ConcurrentHashMap<String, Integer> concurrent = new ConcurrentHashMap<>();
        concurrent.merge("apple", 1, Integer::sum);  // CAS-based atomic increment
        System.out.println("ConcurrentHashMap: " + concurrent);
    }
}

Class	Thread-safe	Null keys/values	Ordered	Use when
`HashMap`	❌	✅ / ✅	❌	Single-threaded or externally synchronized
`LinkedHashMap`	❌	✅ / ✅	Insertion order	Need insertion-order iteration (e.g., LRU cache)
`TreeMap`	❌	❌ / ✅	Sorted by key	Need sorted keys or range queries (BST internally)
`ConcurrentHashMap`	✅	❌ / ❌	❌	High-concurrency, write-heavy, multi-threaded

For a full deep-dive on Java HashMap internals (treeification, resize strategy) and ConcurrentHashMap's segment locking, a dedicated follow-up post is planned.

📚 What Hash Tables Teach You About Trade-offs

O(1) average is not O(1) guaranteed. Every hash table has a worst case. Security-sensitive systems (web servers) must use cryptographic hash functions (SipHash, etc.) to prevent deliberate hash flooding attacks that force O(N) collisions.
Space and time have a direct relationship. A larger backing array → fewer collisions → faster lookups. Load factor is the knob that controls this trade-off.
Ordering is sacrificed for speed. A hash table scrambles keys by design. If you need to iterate keys in sorted order, you need a BST or a sorted array instead.
Rehashing is amortized, not free. Individual rehash events cost O(N). Latency-sensitive systems sometimes pre-size hash tables to avoid mid-operation rehashing and jitter.

📌 TLDR: Summary & Key Takeaways

Hash function maps key → array index in O(1). Must be deterministic and uniform.
Collisions are inevitable; Separate Chaining uses linked lists; Linear Probing uses the array itself.
Rehash at load factor 0.75 to keep average O(1); amortized cost is O(1) per insert.
Hash tables have no order — use a BST when you need sorted iteration or range queries.

Article tools

Explain simpler Compare approaches What next?

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata