HyperLogLog Explained: Counting Billions of Unique Items with 12 KB
How stochastic averaging over leading-zero bit runs estimates cardinality to ±0.81% using 12 KB of memory regardless of dataset size.
Abstract AlgorithmsIntermediate
For developers with some experience. Builds on fundamentals.
Estimated read time: 17 min
AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit string, the maximum length of leading zeros you observe is statistically correlated with how many distinct hashes you have seen. Multiple registers stochastically average this observation to reduce variance. Redis ships HyperLogLog as a native data type via
PFADDandPFCOUNT. A HashSet of 100 million unique IDs requires ~800 MB; HyperLogLog requires 12 KB — a 66,000× reduction.
📖 The Unique Visitor Problem at Google Analytics Scale
Google Analytics must answer one of the most common questions in web analytics: how many unique visitors viewed this page today? Not total page views — distinct users. At Google scale, a popular page may receive hundreds of millions of views per day from tens of millions of distinct users.
The naive solution is a HashSet<userId>. Insert every visitor ID; return set.size() for the unique count. For 10 million distinct users with 64-bit IDs, that is 80 MB of set storage — for one page, for one day. Google Analytics tracks billions of pages. Storing exact unique visitor counts for all of them simultaneously would require petabytes of RAM.
This is the cardinality estimation problem: counting the number of distinct elements in a multiset, when the multiset may be too large to store in memory.
HyperLogLog solves it with a counterintuitive trick: instead of storing elements, it observes statistical patterns in their hash values. With only 4096 six-bit registers (12 KB total), HyperLogLog can estimate cardinality anywhere from 1 to 2^64 with ±0.81% error. A 66,000× memory reduction with a sub-1% error rate that is wholly acceptable for analytics.
🔍 The Intuition: Leading Zeros as a Cardinality Signal
Before the full algorithm, understand the core intuition — the "bit trick" that makes HyperLogLog work.
If you hash each element to a uniformly random bit string, the probability that a given hash starts with at least k leading zeros is:
P(≥ k leading zeros) = 1 / 2^k
Why? Each bit of a random hash is independently 0 or 1 with equal probability. The probability that the first bit is 0 is 1/2. That the first two bits are both 0 is 1/4. That the first k bits are all 0 is 1/2^k.
This means: if you process a stream of elements and track the maximum number of leading zeros you have ever observed, that maximum (ρ_max) is a signal about how many distinct elements you have seen.
| Observed maximum leading zeros (ρ_max) | Estimated distinct elements |
| 10 | ~2^10 = 1,024 |
| 20 | ~2^20 ≈ 1 million |
| 30 | ~2^30 ≈ 1 billion |
| 33 | ~2^33 ≈ 8.6 billion |
The estimator is 2^ρ_max. If the longest leading-zero run you have seen is 20, you have probably observed around one million distinct elements — because you needed roughly that many random hashes before one of them started with twenty zeros.
The problem: a single ρ_max tracker has enormous variance. One unlucky hash — say, a hash that happens to start with 40 leading zeros by pure chance — would wildly overestimate cardinality. The HyperLogLog algorithm's contribution is eliminating this variance through stochastic averaging.
⚙️ The Full Algorithm: m Registers + Harmonic Mean
HyperLogLog replaces the single ρ_max tracker with m independent registers — small counters, one per sub-stream — and averages their estimates.
Step 1: Split Each Hash into a Routing Prefix and a Value
For each element x, compute a 64-bit hash h(x). Split it into:
- Routing prefix: the first
bbits, wherem = 2^bregisters. For Redis's default of 4096 registers,b = 12. - Value suffix: the remaining 64 − b bits.
The routing prefix determines which register to update. The value suffix is where you count leading zeros.
Step 2: Update the Register
For the register identified by the routing prefix, compute the number of leading zeros in the value suffix, plus 1 (call this ρ(x)). Update the register if the new value is larger than the stored value:
registers[prefix] = max(registers[prefix], ρ(x))
Each register stores only the maximum ρ value ever observed for elements that routed to it. A 6-bit register stores values 0–63, which is sufficient for any 64-bit hash prefix — justifying the 6-bit-per-register memory layout.
Step 3: Estimate Cardinality Using the Harmonic Mean
After processing all elements, compute the cardinality estimate as:
E = α_m × m² × (Σ 2^(-registers[j]))^(-1)
Where:
m= number of registersα_m= a bias-correction constant that depends onm- The sum is the harmonic mean denominator over all registers
The harmonic mean (not arithmetic mean) is the key to variance reduction. A single high ρ value — caused by a lucky hash — inflates that one register but is drowned out by the harmonic mean of 4095 other registers. The estimator is robust to lucky outlier hashes in a way that a simple maximum or arithmetic mean is not.
Why 4096 Registers?
With m = 4096 registers, the standard error of the harmonic mean estimator is approximately:
σ/E ≈ 1.04 / √m = 1.04 / √4096 = 1.04 / 64 ≈ 0.0163 ≈ 1.63%
Redis applies an additional bias correction that brings this down to ±0.81% in practice. More registers → lower error, but linearly more memory. 4096 registers is the standard balance point for analytics workloads.
🧠 Deep Dive: Internals, Math, and Performance
The Internals: Memory Layout and Register Encoding
Redis's HyperLogLog representation uses exactly 12,304 bytes (≈ 12 KB). The layout:
- 4096 registers × 6 bits per register = 24,576 bits = 3,072 bytes for the register array.
- Redis packs these into 6144 bytes with its specific dense encoding (two additional bytes per 3-byte group for alignment).
- A 16-byte header stores the magic bytes, encoding type, and a cached cardinality estimate.
The 6-bit register width is sufficient because no 64-bit hash can have more than 64 leading zeros, and ρ + 1 ≤ 64 fits in 6 bits. In the Redis implementation, 6-bit registers are packed into a byte array without padding — reading a register requires bit masking and shifting, but this is a one-time cost per PFCOUNT call.
Mathematical Model: The Estimator and Bias Correction
The raw estimator E = α_m × m² × (Σ 2^(-registers[j]))^(-1) has known biases at the extremes:
- Small cardinalities (E < 2.5m): many registers hold 0, creating systematic underestimation. Redis uses linear counting as a correction when the number of zero registers is above a threshold.
- Large cardinalities (E > 2^32/30): hash collisions at 64-bit scale cause underestimation. Redis applies a large-range correction formula.
- Mid-range (2.5m ≤ E ≤ 2^32/30): the raw estimator is used with no correction needed.
The bias-correction constants (α_m) for common register counts:
| m | α_m | Standard error |
| 16 | 0.673 | 26% |
| 32 | 0.697 | 18% |
| 64 | 0.709 | 13% |
| 128 | 0.715 | 9.2% |
| 4096 | 0.7213 | 1.63% (Redis: ~0.81%) |
Performance Analysis: Time and Space Complexity
| Operation | Time complexity | Notes |
| Add element | O(1) | One hash + one register update |
| Estimate cardinality | O(m) | Sum over all m registers — O(4096) ≈ O(1) for fixed m |
| Merge two HyperLogLogs | O(m) | Element-wise max of register arrays |
| Space | O(m) fixed | 12 KB for m=4096, regardless of n |
Adding an element is genuinely O(1): compute one 64-bit hash, extract the routing prefix, compare and possibly update one 6-bit register. The cardinality estimate is O(m) — iterating all 4096 registers — but since m is fixed and small, it is effectively constant time regardless of how many elements have been added.
Merge support is the distributed superpower. The merge of two HyperLogLog sketches is computed by taking the element-wise maximum of their register arrays:
merged_registers[j] = max(registers_A[j], registers_B[j])
This makes HyperLogLog perfectly composable for distributed systems: each shard computes its own HyperLogLog sketch independently, and the coordinator merges them in O(m) per shard — producing a global cardinality estimate with the same ±0.81% error as if the coordinator had seen every element.
📊 Visualizing the HyperLogLog Estimation Pipeline
The following diagram traces how a stream of user IDs flows through HyperLogLog's estimation pipeline — from hashing to register update to cardinality estimate.
flowchart TD
A(["Stream of user IDs: user_101, user_202, user_101, ..."]) --> B["Hash each element to a 64-bit value\nh(user_101) = 0b000101..."]
B --> C["Split hash: first b=12 bits → register index\nremaining 52 bits → leading-zero count ρ"]
C --> D{"Which register does this element route to?\nindex = first 12 bits of hash"}
D --> E["Register 0\nstores max ρ seen"]
D --> F["Register 1\nstores max ρ seen"]
D --> G["..."]
D --> H["Register 4095\nstores max ρ seen"]
E --> I["Compute harmonic mean of 2^(-registers[j]) across all 4096 registers"]
F --> I
G --> I
H --> I
I --> J["Apply bias correction α_m\nand small/large range adjustments"]
J --> K(["Cardinality estimate E with ±0.81% error"])
style A fill:#3b82f6,color:#fff
style K fill:#22c55e,color:#fff
Each element routes to exactly one register based on its hash prefix, and the register stores the maximum leading-zero count ever observed. The harmonic mean across all registers combines 4096 independent sub-stream estimates into a single cardinality figure. No individual element is stored — only the register maximums.
🌍 Real Uses: Where HyperLogLog Powers Analytics at Scale
Redis (built-in, since 2013). Redis ships HyperLogLog as a native data type. Every PFADD call updates a 12 KB sketch; every PFCOUNT returns the estimated cardinality with ±0.81% error. Redis's implementation is the most widely deployed HyperLogLog in the industry.
Google Analytics and Mixpanel. Unique user/event counting per page, per cohort, per A/B experiment arm. At the scale of billions of events per day, maintaining exact HashSets is untenable. HyperLogLog provides per-page unique visitor counts with sub-1% error in kilobytes of memory per tracked metric.
Google BigQuery. The APPROX_COUNT_DISTINCT() SQL function uses HyperLogLog internally. When you run SELECT APPROX_COUNT_DISTINCT(user_id) FROM events, BigQuery sketches each partition independently and merges results at query time — delivering results orders of magnitude faster than COUNT(DISTINCT user_id) for large tables.
Apache Spark. The approxCountDistinct() Dataset API function uses HyperLogLog sketches per partition, then merges them at the driver. A query that would require shuffling billions of rows for exact distinct counting instead transmits 12 KB of sketch data per partition.
Apache Flink. Streaming approxCountDistinct windows maintain HyperLogLog sketches per window, enabling approximate unique element counts over sliding or tumbling time windows with constant memory per window regardless of throughput.
⚖️ Trade-offs and Failure Modes
HyperLogLog cannot remove elements. Once you add an element to the sketch, you cannot take it back. If you need a sliding-window unique count (e.g., unique users in the last 24 hours), you cannot subtract yesterday's users from today's HyperLogLog. Mitigation: maintain a separate HyperLogLog per time bucket (hourly or daily sketches). Merge buckets for aggregate windows; discard expired buckets.
Error is statistical, not absolute. The ±0.81% standard error means that in rare cases, the estimate may be further off. For very small cardinalities (under ~1000 elements), the relative error may be higher. Mitigation: Redis uses linear counting for small cardinalities to improve accuracy; for extremely small sets (under ~100 elements), an exact counter may be preferable.
No element enumeration. HyperLogLog tells you how many distinct elements there are — it cannot tell you which elements they are. If you need to know what the distinct elements are (not just how many), you need an exact structure.
Merging can only grow estimates. The element-wise maximum merge operation can only increase register values, not decrease them. If you merge two sketches and then want to "un-merge" one of them, there is no operation to do so.
| Failure Mode | Symptom | Mitigation |
| Sliding window needed | Cannot remove expired elements | Per-bucket sketches; merge for aggregates |
| Very small cardinalities | Higher relative error | Use exact counter below ~100 elements |
| Need exact counts | ±0.81% error unacceptable | HashSet or COUNT(DISTINCT) |
| Cannot identify elements | "Which distinct IDs?" unanswerable | HyperLogLog is count-only |
🧭 Decision Guide: When HyperLogLog Is the Right Tool
| Situation | Recommendation | Reason |
| Count unique users/events at analytics scale | HyperLogLog | 12 KB for any cardinality, ±0.81% error |
| Need sliding-window unique counts | Per-bucket HyperLogLogs + merge | Discard expired buckets; merge remaining |
| Need exact unique count (compliance, billing) | COUNT(DISTINCT) or HashSet | Accept memory/compute cost for correctness |
| Membership check — "is X in the set?" | Bloom Filter | HyperLogLog answers "how many", not "which" |
| Frequency of individual elements | Count-Min Sketch | HyperLogLog cannot count per-element frequency |
| Distributed unique counting across shards | HyperLogLog per shard + merge | O(m) merge, same error as centralized sketch |
The deciding question: Do you need to count distinct elements, not store or enumerate them? If yes, and if an error of less than 1% is acceptable (which it is for nearly all analytics use cases), HyperLogLog is the correct and highly efficient choice.
🧪 HyperLogLog in Java: Simulation and Guava Integration
The following Java code demonstrates the leading-zeros intuition that underlies HyperLogLog — useful for understanding the algorithm before reaching for a production library.
import java.util.Arrays;
import java.util.Random;
/**
* Simplified HyperLogLog demonstration — NOT production-ready.
* For production use: Google Guava's HyperLogLog or Redis PFADD/PFCOUNT.
*/
public class SimpleHyperLogLog {
private final int m; // number of registers (must be power of 2)
private final int b; // log2(m) — bits used for routing
private final byte[] regs; // register array, each holds max rho seen
public SimpleHyperLogLog(int b) {
this.b = b;
this.m = 1 << b; // m = 2^b
this.regs = new byte[m];
}
public void add(long hashedValue) {
// Route to a register using the top b bits
int registerIndex = (int) (hashedValue >>> (64 - b));
// Count leading zeros in the remaining (64 - b) bits, + 1
long valueSuffix = hashedValue << b;
byte rho = (byte) (Long.numberOfLeadingZeros(valueSuffix) + 1);
// Update register if new rho is larger
if (rho > regs[registerIndex]) {
regs[registerIndex] = rho;
}
}
public long estimate() {
// Compute the harmonic mean of 2^(-regs[j])
double harmonicSum = 0.0;
int zeroRegisters = 0;
for (byte reg : regs) {
harmonicSum += Math.pow(2.0, -reg);
if (reg == 0) zeroRegisters++;
}
double alphaMM = 0.7213 / (1 + 1.079 / m) * m * m;
double rawEstimate = alphaMM / harmonicSum;
// Small range correction: use linear counting when many registers are zero
if (rawEstimate <= 2.5 * m && zeroRegisters > 0) {
return Math.round(m * Math.log((double) m / zeroRegisters));
}
return Math.round(rawEstimate);
}
}
Usage example — unique visitor counting:
SimpleHyperLogLog hll = new SimpleHyperLogLog(12); // b=12 → 4096 registers
// Simulate 1,000,000 visitor events (with some repeated visitors)
Random rand = new Random(42);
int trueDistinct = 800_000;
for (int i = 0; i < 1_000_000; i++) {
long userId = rand.nextInt(trueDistinct); // IDs in [0, 800000)
long hash = murmurMix(userId);
hll.add(hash);
}
long estimated = hll.estimate();
System.out.printf("True distinct: %d%n", trueDistinct);
System.out.printf("HLL estimate: %d%n", estimated);
System.out.printf("Error: %.2f%%%n",
Math.abs(estimated - trueDistinct) * 100.0 / trueDistinct);
// Typical output: Error: 0.74%
Three scenarios where this matters:
A/B test reach. Count how many distinct user IDs received each variant. Use one HyperLogLog per variant; compare estimates to ensure balanced exposure without storing every user ID.
Daily active users per feature flag. Maintain a per-day HyperLogLog keyed by
(featureFlag, date).PFMERGEthe last 7 daily sketches for weekly active users.Distinct source IPs per time window. Network intrusion detection needs to know when the number of distinct source IPs accessing a service spikes. HyperLogLog tracks this in 12 KB per window, alerting when cardinality crosses a threshold.
🛠️ Redis PFADD / PFCOUNT: HyperLogLog as a Native Data Type
Redis ships HyperLogLog as a first-class data type since Redis 2.8.9. The commands are prefixed PF in honour of Philippe Flajolet, the mathematician who invented the underlying LogLog algorithm.
# Add elements to a HyperLogLog key
PFADD page:views:2026-05-02 user_101 user_202 user_303 user_101
# Get the approximate unique count
PFCOUNT page:views:2026-05-02
→ (integer) 3
# user_101 was added twice — HyperLogLog correctly de-duplicates
# Per-day sketches for a week
PFADD page:views:2026-04-26 user_101 user_404 user_505
PFADD page:views:2026-04-27 user_202 user_303 user_606
# Merge 7-day sketches into a weekly unique visitor count
PFMERGE page:views:week-of-2026-04-26 \
page:views:2026-04-26 \
page:views:2026-04-27
PFCOUNT page:views:week-of-2026-04-26
→ (integer) 6 # 6 distinct users seen across both days
# PFCOUNT can also accept multiple keys directly (performs merge internally)
PFCOUNT page:views:2026-04-26 page:views:2026-04-27
→ (integer) 6
Key behaviors to know:
PFADDreturns 1 if the sketch changed (the cardinality estimate increased), 0 if not. Useful for detecting when new distinct elements arrive.PFCOUNTwith multiple keys performs an on-the-fly merge and returns the combined unique count without modifying any of the source sketches.PFMERGEwrites the merged sketch to a destination key — use this when you want to persist the merged result.- Memory is always capped at ~12 KB per key regardless of how many elements have been added.
- Redis internally switches from a "sparse" representation (for small cardinalities) to the standard "dense" 12 KB representation once enough elements have been added — optimizing memory for small sketches automatically.
📚 Lessons Learned
The 12 KB guarantee is absolute — not approximate. The memory cost of a HyperLogLog sketch in Redis is fixed at ~12 KB regardless of whether you have seen 10 or 10 billion distinct elements. Plan your memory budget around the number of distinct metrics you track (pages, features, experiments), not the number of elements per metric.
The ±0.81% error is a standard deviation, not a maximum. In statistical terms, 68% of estimates fall within ±0.81%. Very occasionally an estimate may be 2–3× the standard deviation off. For critical exact counts (billing, compliance audits), use
COUNT(DISTINCT)on your source-of-truth store.Merging is compositional but irreversible. You can merge daily sketches into a weekly sketch perfectly. You cannot then remove one day's contribution from the merged sketch. Design your time-bucketing strategy upfront: keep granular sketches, merge for aggregates, discard expired granular buckets.
HyperLogLog cannot answer "which" — only "how many." This surprises teams who adopt it expecting to list the unique elements. It is a pure cardinality estimator. Pair it with a separate bloom filter or exact structure if you also need membership testing on the same stream.
Hash quality matters enormously. A poor hash function that clusters many distinct elements into the same leading-zero patterns destroys the statistical independence of the registers. Use a high-quality 64-bit non-cryptographic hash: MurmurHash3 or xxHash64. Redis uses a custom 64-bit hash internally.
Distributed merging is the killer feature. In a distributed system, each shard independently computes a 12 KB HyperLogLog sketch. Merging 1000 shards' sketches costs 1000 × O(4096) register comparisons — about 4 million integer comparisons — and produces an estimate as accurate as if a single machine had seen all the data. No data shuffling, no central aggregation bottleneck.
📌 Summary & Key Takeaways
HyperLogLog is the correct structure when you need to answer "how many distinct elements have I seen?" at any scale, with sub-1% error, in fixed memory.
- Core algorithm: hash each element → use top
bbits as register index → count leading zeros in remaining bits → update register with the maximum seen. Estimate using the harmonic mean of2^(-registers[j])across allmregisters with bias correction. - Memory is always 12 KB for m=4096 registers — regardless of whether you have observed 100 or 100 billion distinct elements.
- ±0.81% error is the standard error with Redis's bias correction — acceptable for all analytics workloads.
- Merge is O(m) and compositional: per-shard sketches merge into a global estimate with the same error bound.
- No deletion. Use per-bucket sketches for sliding-window distinct counts.
- Redis PFADD / PFCOUNT / PFMERGE are the production path — no custom implementation needed.
- The decisive test: if you need to count distinct elements across a high-volume stream and exact counting would require gigabytes of memory, HyperLogLog delivers the answer in 12 KB.
🔗 Related Posts
- Bloom Filters Explained: Membership Testing with Zero False Negatives — The companion probabilistic structure for set membership — answers "is X in the set?" rather than "how many distinct X?"
- Count-Min Sketch Explained: Frequency Estimation at Streaming Scale — Fixed-memory frequency estimation for streaming data — answers "how often has X appeared?" in the same probabilistic family
- Probabilistic Data Structures: A Practical Guide — Overview and decision guide across Bloom, Cuckoo, HyperLogLog, and Count-Min Sketch
- Sliding Window Technique — Another O(n) streaming pattern for subarray and window problems in the Data Structures and Algorithms series
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Bloom Filters Explained: Membership Testing with Zero False Negatives
TLDR: A Bloom filter is a bit array of m bits + k independent hash functions that sets k bits on insert and checks those same k bits on lookup. If any checked bit is 0, the element is definitely not in the set — false negatives are mathematically imp...
Count-Min Sketch Explained: Frequency Estimation at Streaming Scale
TLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one column per row, increment those d counters. Query: ...
Split Brain Explained: When Two Nodes Both Think They Are Leader
TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...
Clock Skew and Causality Violations: Why Distributed Clocks Lie
TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...
