Understanding LSM Trees, B+ Trees & Hybrid Indexing Models

1. LSM Trees (Log-Structured Merge Trees) vs. B+ Trees (Balanced Tree Indexing)

✅ Efficient Reads for Older Data

B+ Trees keep data sorted and maintain balanced levels for fast lookups.
Since all index nodes (except leaves) are stored in memory, traversing to a leaf node is fast (O(log N)).
Reads are efficient even for older data since it’s already in a well-structured, sorted format.

❌ Low Write Throughput

B+ Trees require in-place updates and node splits when inserting.
This leads to random I/O, which is expensive for disk-based storage.
As data grows, maintaining balance (split/merge operations) slows down write performance.

🔥 Ideal Use Case:

✅ Optimized for Writes

Log-structured merges (LSM) buffer writes in MemTables (in-memory structures).
When MemTables reach a threshold, they are flushed as immutable SSTables (Sorted String Tables).
This results in sequential writes, which are much faster than random I/O.

❌ Read Latency Increases for Older Data

Recent data is in the MemTable (fastest access).
Older data is in multiple SSTables, requiring merge operations to reconstruct records.
As data ages, more SSTables must be scanned, leading to higher read latency.
Compaction merges SSTables periodically to improve query efficiency, but at a CPU/storage cost.

🔥 Ideal Use Case:

Example Databases:

MongoDB (WiredTiger) - Balances write performance with fast indexed reads.
MyRocks (MySQL Engine) - Uses RocksDB (LSM-based) for optimized writes while keeping relational query power.

Example Databases:

TokuDB (MySQL Engine by Percona) - High insertion speed, good for analytical workloads.
Percona FT - Uses fractal tree indexing to improve both write and read performance.

Example Databases:

Hekaton (SQL Server In-Memory OLTP Engine) - Optimized for high concurrency.
FAWN (Fast Array of Wimpy Nodes) - Efficient distributed storage for key-value workloads.

Indexing Model	Write Performance	Read Performance	Best Used In
B+ Tree	❌ Slow (random writes)	✅ Fast (O(log N), great for range queries)	OLTP Databases (MySQL, PostgreSQL)
LSM Tree	✅ Fast (sequential writes)	❌ Slower for historical reads	Write-heavy workloads (Cassandra, RocksDB)
Fractal Tree	✅✅ Very Fast (buffered writes)	✅ Fast (optimized range queries)	Analytical & mixed workloads (TokuDB)
Bw-Tree	✅ Fast (lock-free writes)	✅✅ Very Fast (copy-on-write reads)	High concurrency OLTP (SQL Server Hekaton)
Hybrid (LSM + B+ Tree)	✅ Fast (batched writes)	✅ Fast (optimized index lookups)	MongoDB (WiredTiger), MyRocks

If you are optimizing for write-heavy workloads: → Use LSM Tree-based databases (Cassandra, Druid, RocksDB).
If you need fast transactional reads & writes: → Use Bw-Trees (Hekaton) or Fractal Trees (TokuDB, Percona FT).
If you need a balance between SQL and high-performance indexing: → Use Hybrid models like MongoDB (WiredTiger) or MyRocks.

This document serves as a long-term reference for database indexing models. 🚀