Abstract Algorithms

Hashing Cardinality Estimation Approximation Error Registers Streaming Data HyperLogLog Leading Zeros Count-Min Sketch

Guidance

Hashing

Continues from what you have already explored.

I can continue your learning session from the exact context you left off.

Resume Context

Continue Learning Practice Tradeoffs Next Drill

System behavior

HyperLogLog Cardinality Estimation

Hash values route into registers, leading-zero runs update maxima, and the harmonic mean estimates unique cardinality with bounded error.

Open

Speed

Step 1 / 3Normal flow

Read in sequence

1Probabilistic Data Structures: A Practical Guide to Bloom Filters, HyperLogLog, and Count-Min SketchTLDR: Probabilistic data structures trade a small, bounded probability of being wrong for orders-of-magnitude better memory efficiency and O(1) speed. Bloom Filters answer "definitely not in this set"14 min 2HyperLogLog Explained: Counting Billions of Unique Items with 12 KBTLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit st18 min 3Count-Min Sketch Explained: Frequency Estimation at Streaming ScaleTLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one c22 min 4Bloom Filters Explained: Membership Testing with Zero False NegativesTLDR: A Bloom filter is a bit array of m bits + k independent hash functions that sets k bits on insert and checks those same k bits on lookup. If any checked bit is 0, the element is definitely not i19 min 5How Bloom Filters Work: The Probabilistic SetTLDR TLDR: A Bloom Filter is a bit array + multiple hash functions that answers "Is X in the set?" in \(O(1)\) constant space. It can return false positives (say "yes" when the answer is "no") but ne13 min 6Variational Autoencoders (VAE): The Art of Compression and CreationTLDR: A VAE learns to compress data into a smooth probabilistic latent space, then generate new samples by decoding random points from that space. The reparameterization trick is what makes it trainab13 min 7What are Hash Tables? Basics ExplainedTLDR: A hash table gives you near-O(1) lookups, inserts, and deletes by using a hash function to map keys to array indices. The tradeoff: collisions (when two keys hash to the same slot) must be handl12 min 8NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split DataTLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nod24 min 9SQL Partitioning: Range, Hash, List, and Composite Strategies ExplainedTLDR: SQL partitioning divides one logical table into smaller physical child tables, all accessed through the parent table name. The query optimizer skips irrelevant child tables entirely — a process 25 min 10Partitioning in Spark: HashPartitioner, RangePartitioner, and Custom StrategiesTLDR: Spark's partition count and partitioning strategy are the two levers that determine whether a job scales linearly or crumbles under data growth. HashPartitioner distributes keys by hash modulo —26 min 11Python Data Structures: Lists, Dicts, Sets, and TuplesTLDR: Python's four built-in collections are not interchangeable — their internals are fundamentally different. list is a dynamic array: fast at the end, slow for membership. dict is a hash table: O(126 min 12Partitioning Approaches in SQL and NoSQL: Horizontal, Vertical, Range, Hash, and List PartitioningTLDR: Partitioning splits one logical table into smaller physical pieces. The database skips irrelevant pieces entirely — turning a 30-second full-table scan into a sub-second single-partition read. S12 min

Related threads

Find the idea you are trying to connect