All Posts

Cyclic Sort: Find Missing and Duplicate Numbers in O(n) Time, O(1) Space

Place every number at its correct index in one sweep — then scan for mismatches. No HashSet, no extra memory.

Abstract AlgorithmsAbstract Algorithms
··16 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: If an array holds n numbers in range [1, n], each number belongs at index num - 1. Cyclic sort places every element at its correct index in O(n) time using O(1) space — then a single scan reveals every missing and duplicate number. Five interview problems, one technique.


📖 The Missing Number Challenge That Bans the Easy Solution

Here is the exact constraint that forces cyclic sort into existence:

"Given an array of n integers in the range [1, n], find all missing numbers — in O(n) time and O(1) space. You may not use a HashSet or any additional array."

Without the O(1) space constraint, this is trivial: drop every number into a HashSet, then scan 1..n for gaps. That costs O(n) time and O(n) space. With the constraint, the HashSet is off the table.

The key observation is that the input array itself is the auxiliary space — if the array has n slots and contains numbers 1 through n, then the correct home for number k is index k - 1. An array that is fully sorted in cyclic order looks like [1, 2, 3, ..., n]. Any slot where nums[i] != i + 1 after a sort reveals a missing or duplicate number.

Cyclic sort exploits this identity: it places every number at its correct index by cycling through swaps, then reads the answer directly from the index-value mismatches.


🔍 Index as Identity: The Invariant That Drives the Algorithm

The invariant is deceptively simple:

Number k belongs at index k − 1.

For an array [3, 1, 4, 2] (n = 4, range [1, 4]):

IndexCurrent valueCorrect valueIn place?
031
112
243
324

After cyclic sort, every element sits at its correct index: [1, 2, 3, 4]. No mismatches — no missing numbers.

For a missing number scenario like [3, 4, -1, 1] (where −1 is a placeholder for missing 2), the algorithm places what it can and leaves the slot for 2 empty (or containing a wrong value), revealing the gap.

This index-as-identity relationship is what makes the algorithm O(1) space: you are not recording presence in a separate structure, you are rearranging the input array into its own sorted-by-identity form.


⚙️ The Cyclic Sort Algorithm: Place Every Number at Its Correct Index

Walk the array from left to right. At each position i:

  • Compute the correct index for nums[i]: that is j = nums[i] - 1.
  • If nums[i] != nums[j] (the number is not already at its correct home), swap nums[i] with nums[j].
  • If nums[i] == nums[j] (either in place or a duplicate), advance i.

The swap sends nums[i] toward its correct slot. You do not advance i after a swap because the value that arrived at position i from the swap may itself need to move. You only advance when nums[i] is already correct (or cannot be placed due to a duplicate).

public class CyclicSort {

    // Core cyclic sort — places each number at index nums[i] - 1
    public void cyclicSort(int[] nums) {
        int i = 0;
        while (i < nums.length) {
            int j = nums[i] - 1; // correct index for nums[i]
            if (nums[i] != nums[j]) {
                swap(nums, i, j); // send nums[i] to its correct home
            } else {
                i++; // nums[i] is in place (or duplicate) — advance
            }
        }
    }

    private void swap(int[] nums, int i, int j) {
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
    }
}

Trace on [3, 1, 4, 2]:

StepiArray stateAction
10[3, 1, 4, 2]nums[0]=3, j=2; nums[0]≠nums[2] → swap(0,2)
20[4, 1, 3, 2]nums[0]=4, j=3; nums[0]≠nums[3] → swap(0,3)
30[2, 1, 3, 4]nums[0]=2, j=1; nums[0]≠nums[1] → swap(0,1)
40[1, 2, 3, 4]nums[0]=1, j=0; nums[0]==nums[0] → i++
51[1, 2, 3, 4]nums[1]=2, j=1; in place → i++
62[1, 2, 3, 4]nums[2]=3, j=2; in place → i++
73[1, 2, 3, 4]nums[3]=4, j=3; in place → i++

Result: fully sorted. No missing numbers.


🧠 Deep Dive: The Cycle Invariant and Why O(n) Holds Despite Nested Loops

Under the Hood: Internals of Cyclic Placement

The algorithm looks like it could be O(n²) — there is a while loop and each iteration may not advance i. But the key insight is that every swap permanently places at least one element at its correct index. Each element can be swapped into its correct position at most once. After that, it never moves again because nums[j] == j + 1 is satisfied, and any future visit to that index will trigger the i++ branch immediately.

Formally: across all iterations, the total number of swaps is at most n − 1 (you need at most n−1 swaps to fully sort n elements). The i++ operation also happens at most n times — once per index. Total operations = at most n + (n−1) = 2n − 1 = O(n).

The data structure is the input array itself. No additional structure is maintained. After cyclic sort completes, the array stores its "answer" in the index-value mismatch pattern — the structure is the answer.

Performance Analysis: O(n) Despite the Apparent Inner Loop

DimensionComplexityJustification
TimeO(n)Each element is swapped to its correct position at most once; total swaps ≤ n−1
SpaceO(1)All operations are in-place; only scalar variables i, j, temp used
Post-sort scanO(n)One additional pass to detect mismatches
TotalO(n) time, O(1) spaceBoth phases are linear; sorting dominates with a constant factor of ~2

This is why cyclic sort beats HashSet for the space-constrained variant: HashSet gives O(n) time and O(n) space, while cyclic sort gives O(n) time and O(1) space.


📊 Cyclic Placement Step by Step: Visualizing the Swap Cycle

flowchart TD
    A[Start: i = 0] --> B["Compute j = nums[i] - 1"]
    B --> C{"nums[i] == nums[j]?"}
    C -->|"No — wrong position"| D["Swap nums[i] with nums[j]"]
    D --> B
    C -->|"Yes — in place or duplicate"| E["Advance: i++"]
    E --> F{"i < n?"}
    F -->|Yes| B
    F -->|No| G[Cyclic sort complete]
    G --> H[Scan array for index-value mismatches]
    H --> I[Report missing / duplicate values]

After the sort completes, every slot where nums[i] != i + 1 is a mismatch. The mismatch's index tells you which number is missing (i + 1), and the mismatch's value tells you which number is a duplicate (nums[i]).


🌍 Real-World Applications: In-Place Array Mutation at Scale

Deduplication pipelines in data engineering often process arrays of entity IDs (sequential integers assigned to database rows). When a batch of IDs in range [1, n] arrives and missing IDs indicate failed inserts, cyclic sort's O(1) space is critical — the data batch itself can be tens of millions of rows, and allocating an auxiliary array of equal size may be cost-prohibitive.

Embedded systems with severe memory constraints (microcontrollers, IoT firmware) use in-place algorithms where heap allocation is banned. Cyclic sort's zero-allocation pattern is a direct fit.

Cache-efficient array processing: because cyclic sort only accesses nums[i] and nums[j] (nearby memory after a few swaps settle), it has better cache behavior than approaches that allocate a parallel array and write to scattered positions.

Input / Process / Output for a missing-number audit:

StageData
Input (raw IDs)[4, 3, 2, 7, 8, 2, 3, 1]
After cyclic sort[1, 2, 3, 4, 3, 2, 7, 8]
Mismatch scanindex 4: 3≠5, index 5: 2≠6
Missing numbers[5, 6]

🧪 Five Problems, One Technique

Each problem below is a minor variation on the core cyclic sort — the sort itself is nearly identical, only the post-sort scan changes.

Problem 1: Find the Missing Number (array has n numbers in range [0, n])

Range is [0, n], so index j = nums[i] (not nums[i] − 1). Skip nums[i] == n since n has no slot.

public int missingNumber(int[] nums) {
    int i = 0;
    while (i < nums.length) {
        int j = nums[i];
        // nums[i] is in range [0, n-1] and not already in place
        if (nums[i] < nums.length && nums[i] != nums[j]) {
            swap(nums, i, j);
        } else {
            i++;
        }
    }
    // Scan for the slot where index != value
    for (int k = 0; k < nums.length; k++) {
        if (nums[k] != k) return k;
    }
    return nums.length; // n itself is missing
}

Problem 2: Find All Missing Numbers (range [1, n], duplicates allowed)

Same core sort as the base algorithm. After sorting, every slot where nums[i] != i + 1 is a missing number.

public List<Integer> findAllMissingNumbers(int[] nums) {
    int i = 0;
    while (i < nums.length) {
        int j = nums[i] - 1;
        if (nums[i] != nums[j]) {
            swap(nums, i, j);
        } else {
            i++;
        }
    }
    List<Integer> missing = new ArrayList<>();
    for (int k = 0; k < nums.length; k++) {
        if (nums[k] != k + 1) missing.add(k + 1);
    }
    return missing;
}

Problem 3: Find the Duplicate Number (n+1 numbers in range [1, n])

One number appears twice. When nums[i] == nums[j] and nums[i] != i + 1, the value cannot be placed — it is the duplicate.

public int findDuplicate(int[] nums) {
    int i = 0;
    while (i < nums.length) {
        if (nums[i] != i + 1) {
            int j = nums[i] - 1;
            if (nums[i] != nums[j]) {
                swap(nums, i, j);
            } else {
                return nums[i]; // duplicate detected
            }
        } else {
            i++;
        }
    }
    return -1;
}

Problem 4: Find All Duplicates (range [1, n], each number appears once or twice)

After sorting, every slot where nums[i] != i + 1 holds a value that appeared twice.

public List<Integer> findAllDuplicates(int[] nums) {
    int i = 0;
    while (i < nums.length) {
        int j = nums[i] - 1;
        if (nums[i] != nums[j]) {
            swap(nums, i, j);
        } else {
            i++;
        }
    }
    List<Integer> duplicates = new ArrayList<>();
    for (int k = 0; k < nums.length; k++) {
        if (nums[k] != k + 1) duplicates.add(nums[k]);
    }
    return duplicates;
}

Problem 5: Find the Corrupt Pair — Missing and Duplicate in One Pass

One number is missing; one number appears twice. The sort leaves one slot with a wrong value: the value at that slot is the duplicate, and the slot's index + 1 is the missing number.

public int[] findCorruptPair(int[] nums) {
    int i = 0;
    while (i < nums.length) {
        int j = nums[i] - 1;
        if (nums[i] != nums[j]) {
            swap(nums, i, j);
        } else {
            i++;
        }
    }
    for (int k = 0; k < nums.length; k++) {
        if (nums[k] != k + 1) {
            return new int[]{nums[k], k + 1}; // {duplicate, missing}
        }
    }
    return new int[]{-1, -1};
}

Summary of all five variants:

ProblemRangeSort tweakScan output
Missing Number[0, n]j = nums[i], skip if nums[i] == nFirst index where nums[k] != k
All Missing Numbers[1, n]StandardAll indices where nums[k] != k+1
Find Duplicate[1, n], n+1 elementsReturn early on collisionReturn duplicate at collision
All Duplicates[1, n]StandardAll values where nums[k] != k+1
Corrupt Pair[1, n]StandardFirst mismatch: value=dup, index+1=missing

⚖️ Cyclic Sort vs HashSet: When Space Actually Matters

The cyclic sort vs HashSet trade-off is not just academic:

ApproachTimeSpaceMutates input?Works on stream?
HashSetO(n)O(n)NoYes
Cyclic sortO(n)O(1)YesNo (needs random access)
Bit vectorO(n)O(n/8)NoYes
XOR trickO(n)O(1)NoYes (single missing only)

Cyclic sort is the right choice when: the input can be modified, random access is available (array, not stream), the O(1) space constraint is hard, and you need a single pattern that handles missing numbers, duplicates, and corrupt pairs uniformly.

The XOR trick handles the "single missing number" case in O(1) space without mutation, but it cannot be extended to "all missing numbers" or "all duplicates" — cyclic sort handles all those variants with the same core code.


🧭 When to Reach for Cyclic Sort

SituationRecommendation
Find missing/duplicate in [1, n] array, O(1) space requiredCyclic sort — it is the canonical solution
Input is a stream (no random access)HashSet or Bit vector instead
Array must not be modifiedHashSet or bit vector
Single missing number, no duplicatesXOR trick is simpler
Multiple missing numbers in [1, n]Cyclic sort or bit vector
Range is not [1, n] (e.g., [0, n])Adjust j = nums[i] instead of nums[i] - 1
Edge case: nums[i] > n (out of range)Guard with bounds check before computing j

🛠️ Java's BitSet: The JDK's O(n/8) Space Approach to the Same Problem

java.util.BitSet is the JDK's built-in solution for presence-tracking problems. It achieves O(n/8) space (1 bit per element) — not O(1), but practical when n is very large.

import java.util.BitSet;

public class BitSetMissingNumbers {

    // Using BitSet — O(n/8) space, does not mutate input
    public List<Integer> findMissingWithBitSet(int[] nums) {
        BitSet bitset = new BitSet(nums.length + 1);

        for (int num : nums) {
            bitset.set(num); // mark num as present
        }

        List<Integer> missing = new ArrayList<>();
        for (int i = 1; i <= nums.length; i++) {
            if (!bitset.get(i)) missing.add(i);
        }
        return missing;
    }
}

BitSet vs Cyclic Sort:

CriterionCyclic SortBitSet
SpaceO(1) — in-placeO(n/8) — separate allocation
Input mutationYesNo
Code simplicityHigher (swap logic)Lower (set/get)
Handles duplicatesYes (all 5 variants)Partial (needs extra pass)

Cyclic sort wins on space and generality. BitSet wins on simplicity and non-mutation. For an interview where the O(1) constraint is stated, cyclic sort is the expected answer. In production code where mutation is unacceptable, BitSet is more pragmatic.


📚 Lessons from Cyclic Sort Pitfalls

Lesson 1 — Forgetting the bounds check. If the problem has out-of-range values (e.g., nums[i] = 0 in a [1, n] array, or nums[i] > n), computing j = nums[i] - 1 will produce a negative or out-of-bounds index. Always guard: if (nums[i] >= 1 && nums[i] <= n && nums[i] != nums[j]).

Lesson 2 — Advancing i after every swap. A critical mistake is writing i++ unconditionally. After swapping nums[i] to nums[j], a new value arrives at position i that may not be in its correct place yet. Only advance i when nums[i] == nums[j] (already placed or collision detected).

Lesson 3 — Reading the scan result backward. In the corrupt pair problem, the mismatch means the value at the mismatch index is the duplicate, and the index + 1 is the missing number — not the other way around.

Lesson 4 — Mixing up range [0, n] vs [1, n]. The LeetCode "Missing Number" problem uses range [0, n], requiring j = nums[i] (not nums[i] - 1). Confusing the two ranges silently produces wrong answers.

Lesson 5 — Expecting the algorithm to work on sorted input. If the input is already sorted, cyclic sort terminates in O(n) time correctly — but if you need to detect duplicates in sorted input, a simpler two-pointer scan is cleaner and more readable.


📌 Summary: The Cyclic Sort Playbook

  • Core invariant: number k belongs at index k − 1. After sorting, mismatches tell the whole story.
  • Advance i only when nums[i] == nums[j] — otherwise swap without advancing. This is what keeps the algorithm O(n) despite looking like it could be O(n²).
  • Each element is swapped at most once — the total swap count is bounded by n − 1, making the total work O(n).
  • O(1) space — the input array is the only storage. The sorted state encodes the answer.
  • Five problems, same core: Missing Number, All Missing Numbers, Find Duplicate, All Duplicates, Corrupt Pair all share the same sort loop with minor scan-phase variations.
  • When to skip it: no random access (streaming), input must not be mutated, range is not [1, n].

One-liner to remember: Place every number at nums[i] − 1, advance only on collision, then scan — mismatches are the answer.


📝 Practice Quiz

  1. After running cyclic sort on [3, 4, -1, 1] (a corrupt pair problem where −1 is an out-of-range sentinel), which index will have a mismatch that reveals the missing number?

    • A) Index 1 — nums[1] will hold the wrong value
    • B) Index 2 — nums[2] will still hold −1 (cannot be placed), revealing 3 is missing
    • C) Index 0 — the first index is always the mismatch after cyclic sort Correct Answer: B
  2. Why does the cyclic sort inner loop run in O(n) total time even though i is not incremented on every iteration?

    • A) Each swap places at least one element permanently — total swaps ≤ n−1, so total iterations ≤ 2n−1
    • B) Java's JIT compiler optimizes the loop to skip already-sorted elements
    • C) The while loop condition i < n limits total iterations to exactly n Correct Answer: A
  3. You are given [1, 4, 4, 2] (range [1, 4], one duplicate). What does findDuplicate return, and at which step is the duplicate detected?

    • A) Returns 4; detected when trying to place the second 4 — nums[0]=4 and nums[3]=4 collide
    • B) Returns 2; detected after a post-sort scan
    • C) Returns 1; detected at the first swap Correct Answer: A
  4. Open-ended challenge: Cyclic sort requires the array to hold numbers in range [1, n] (or [0, n]). If the array contains arbitrary integers — not bounded to [1, n] — cyclic sort breaks down. How would you redesign the O(1)-space missing-number detection for an array containing integers in the range [−10⁹, 10⁹]? What constraints would you relax or what tricks could make this work?



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms