B-tree vs LSM tree index

  • B-trees and LSM trees are both data structures that are used to index data in a database. B-trees are older and more traditional, while LSM trees are newer and more modern. Let’s see how both of them work.

  • B-trees are set up like a tree, where each node holds a series of keys and pointers, one after the other: |K|P|K|P|K|. The keys in the array are sorted and each pointer points to the next node that contains values between the pointer’s adjacent keys.

  • This structure enables fast retrieval of data by traversing the tree from the root to the leaf nodes, which contain the actual data records. The B-tree are self-balancing so that no branch can become too long and degrade performance. B-trees are stored on disk.

  • LSM trees are also organized as a tree, but they are stored in memory. They are composed of a number of levels, each of which contains a sorted list of data. The top level is the smallest level, and it contains the most recent data. The bottom level is the largest level, and it contains the oldest data.

  • When new data is inserted into the LSM tree, it is inserted into the top level. If the top level is full, it is merged with the level below it. This process continues until the data is merged into the bottom level. When the bottom level is full, it is written to disk as an SSTable. SSTables are immutable, which means that they cannot be modified once they are written to disk.

  • LSM trees are more efficient for inserting and updating data than B-trees because they do not need to modify existing nodes in the tree. Instead, they simply insert the new data into the top level in memory. This means that LSM trees can write faster than B-trees.

  • But LSM trees are less efficient for reading and searching data than B-trees. This is because they need to search through multiple levels of the tree in memory and then on the SSTables in the disc. B-trees, on the other hand, only need to search through a tree which has only a few levels. This means that B-trees can read faster than LSM trees.

FeatureB-treesLSM trees
Storageon diskin-memory + disk
ReadFasterSlower
WriteSlowerFaster
Database (generally)SQLNoSQL
ExampleMySQL, PostgreSQLCassandra, HBase
© 2024 DrawSystem Design. All rights reserved.