Roaring Bitmaps in C#: The Data Structure Behind Super Fast Queries
Roaring Bitmaps are one of those data structures that most developers do not hear about often, but they power serious large-scale systems. In this video, I show a simple C# benchmark comparing three ways to find the intersection between large sets of integer IDs: List of int HashSet of int Roaring Bitmap The result: List of int: around 38–43 seconds HashSet of int: around 3–4 milliseconds Roaring Bitmap: around 70–120 microseconds The lesson is not that Roaring Bitmaps should replace every collection. The lesson is that when you are working with huge sets of integer IDs and need fast AND, OR, COUNT, and intersection operations, Roaring Bitmaps can give you incredible performance. Roaring Bitmaps and roaring-style bitmap indexes are used across major search, analytics, observability, and data systems, including Apache Lucene, Elasticsearch, Apache Spark, Apache Druid, Apache Pinot, Netflix Atlas, Microsoft Visual Studio Team Services / Azure DevOps Services history, Google Procella / YouTube SQL Engine, Datadog Husky, InfluxDB, Weaviate, Sourcegraph, M3, Redpanda, and many others. GitHub repo for this demo: https://github.com/hassanhabib/RoaringBitMapsDemoDotNet NuGet package used: https://www.nuget.org/packages/Roaring.Net Research papers and references: https://arxiv.org/abs/1402.6407 https://arxiv.org/abs/1603.06549 https://arxiv.org/abs/1709.07821 Official Roaring Bitmaps: https://roaringbitmap.org/ CRoaring: https://github.com/RoaringBitmap/CRoaring Java RoaringBitmap: https://github.com/RoaringBitmap/RoaringBitmap Go RoaringBitmap: https://github.com/RoaringBitmap/roaring Google Procella / YouTube SQL Engine: https://research.google/pubs/procella-unifying-serving-and-analytical-data-at-youtube/ Datadog Husky: https://www.datadoghq.com/blog/engineering/introducing-husky/ https://www.datadoghq.com/blog/engineering/husky-query-architecture/ Chapters: 00:00 The performance result first 00:37 Why this matters for enterprise systems 01:24 What are Roaring Bitmaps? 02:05 Coffee and bread example 03:10 Why normal list intersections are expensive 04:00 The simple range/compression mental model 05:15 Building the List of int example 06:07 Running the slow List of int version 06:37 Building the HashSet of int example 07:20 HashSet of int performance improvement 08:19 Introducing Roaring Bitmaps in C# 09:10 Creating Roaring32Bitmap sets 10:05 Adding ranges of user IDs 10:52 Optimizing the bitmaps 11:20 Running the bitmap intersection 12:05 Why 0 milliseconds is not really zero 13:00 Measuring microseconds 13:35 Comparing all three approaches 14:20 Where Roaring Bitmaps are used in the real world 15:00 Final thoughts and shoutout #dotnet #csharp #softwareengineering #datastructures #performance #algorithms #roaringbitmaps #opensource #azuredevops #youtube #datadog
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.