Back to Browse

LSM Trees — The Storage Engine Behind Modern Databases

12 views
May 7, 2026
9:16

LSM Trees are the storage engine behind RocksDB, LevelDB, BadgerDB, Cassandra, ScyllaDB, HBase, and the storage layers of Kafka Streams, CockroachDB, TiDB, and most modern write-heavy databases. In this video, we build the Log-Structured Merge Tree from first principles. We start with Bitcask — its strengths and its fatal RAM ceiling — and walk the same path the original LSM designers walked. At each step we ask: what breaks under load, what's the trade-off, how do we recover? You'll learn: • Why writing to RAM first (the memtable) destroys random-disk-write latency • How the Write-Ahead Log gives durability without sacrificing throughput • Why SSTables are sorted, immutable, and self-contained — and what each property buys you • The asymmetric magic of bloom filters, and how to size them • Compaction strategies (size-tiered vs. leveled) and the three-way trade-off between write, read, and space amplification • The full storage-engine spectrum: Redis → Bitcask → LSM → B+ Tree, and when each is the right answer • Production failure modes — compaction storms, write stalls, WAL corruption — and the metrics that warn you before they hit This is the foundational literacy you need to reason about modern databases at scale. If you build distributed systems, you will use LSM-backed stores. This video is how you stop treating them like a black box. #systemdesign #databases #lsmtree #golang #distributedsystems High-Level Roadmap: ┌─────────────────────────────────────────────────┐ │ CLIENT (PUT, GET, DELETE) │ └──────────────────┬──────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────┐ │ 1. WAL (Write-Ahead Log) — durability │ │ 2. MEMTABLE — sorted in-memory buffer │ └──────────────────┬───────────────────────────────┘ │ (periodic flush) ▼ ┌──────────────────────────────────────────────────┐ │ 3. SSTABLE — sorted, immutable, on-disk │ │ 4. BLOOM FILTER — per SSTable │ │ 5. INDEX — sparse key→offset map │ └──────────────────┬───────────────────────────────┘ │ (background) ▼ ┌──────────────────────────────────────────────────┐ │ 6. COMPACTION — merge + reclaim │ │ 7. LEVELED LAYOUT — L0, L1, L2… │ └──────────────────────────────────────────────────┘ The RAM-First Write Path: USER ──WRITE──▶ [ RAM buffer ] ──async, periodic──▶ [ DISK ] (sync) memtable sstable Durability : PUT(k1, v1) ──▶ wal.Append(k1, v1) ──▶ memtable.Put(k1, v1) │ ▼ fsync? (configurable) SSTables: ┌──────────────────────────────────────────────┐ │ DATA BLOCKS │ │ block 1: [k1,v1][k2,v2][k3,v3]… │ │ block 2: [k4,v4][k5,v5][k6,v6]… │ │ … │ │ (each block is typically ~4–64 KB, │ │ often compressed with Snappy/LZ4) │ ├──────────────────────────────────────────────┤ │ INDEX BLOCK │ │ (k1, offset=0) │ │ (k4, offset=4096) │ │ (k7, offset=8192) │ │ … (one entry per data block — sparse) │ ├──────────────────────────────────────────────┤ │ BLOOM FILTER BLOCK │ ├──────────────────────────────────────────────┤ │ FOOTER (pointers to index + bloom) │ └──────────────────────────────────────────────┘ GET(k): 1. Check the active memtable. → if found, return value (or tombstone → 404) 2. Check the immutable memtable. → if found, return 3. For each SSTable, newest first: a. Check the bloom filter. → "no" means skip. b. Binary-search the index. → find the data block. c. Load the block, find the key. d. If found, return value (or tombstone → 404). 4. If no SSTable has the key → return "not found". Compaction: [L0]: SST_A SST_B SST_C ─merge─▶ [L1]: SST_AB_C (deduplicated, sorted, non-overlapping) Leveled Architecture: MEMTABLE (in RAM) │ flush ▼ L0: [SST] [SST] [SST] [SST] ← may have overlapping key ranges │ compact ▼ L1: [SST]──[SST]──[SST] ← non-overlapping; ~10× capacity of L0 │ compact ▼ L2: [SST]──[SST]──[SST]──[SST]──[SST] ← non-overlapping; ~10× of L1 │ ▼ … The Storage Engine Spectrum: ALL RAM ←───────────────────→ ALL DISK [ Redis, DiceDB ] [ Bitcask ] [ LSM Tree ] [ B+ Tree ] keys+values in RAM keys in RAM keys on disk everything on disk 100% memory-bound index in RAM sparse index pages on disk in RAM

Download

0 formats

No download links available.

LSM Trees — The Storage Engine Behind Modern Databases | NatokHD