Back to Browse

HyperLogLog From Scratch | Counting Distinct Elements at Scale

661 views
Dec 2, 2025
13:15

How HyperLogLog estimates the number of distinct elements in massive data streams. HyperLogLog Paper: P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier, “HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm,” Discrete Mathematics & Theoretical Computer Science, Proc., 2007. One thing I didn’t explain in the video: When you split data across HyperLogLog registers/computers, each register sees fewer elements. You might worry this makes each estimate noisier, which could counteract how averaging reduces variance. In practice, averaging still helps because when the registers are balanced (roughly the same number of elements), the variance of each scaled estimate stays about the same as if we hadn’t sharded (split the data into registers/computers). Since the variance doesn’t blow up, averaging effectively reduces the overall noise. .................................................................................... 00:00 - Naïve Counting Algorithm 01:55 - Simple Case: Counting Numbers 03:18 - Extension: Counting Other Things 05:50 - Efficiency: Tracking Longest Run of Zeros 08:02 - Stabilization: Averaging for Stabilization 10:15 - Routing Bitstrings 10:50 - The Algorithm 12:20 - Why “HyperLogLog”? 12:50 - Practical Details

Download

1 formats

Video Formats

360pmp411.7 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

HyperLogLog From Scratch | Counting Distinct Elements at Scale | NatokHD