The data structures that power your data base engine matter because this needs to align with your read and write patterns. Some data structures are faster when reading, and others when writing.

Hash Indexes

This basically takes a key and maps it to a byte offset which is where the value for that key starts. This is affectively just appending to a log (log segment).

To avoid running out of disk space, we break our log into segments of a size and write to a new log, only keeping the most recent values for a key in a process called compaction.

We lose durability here if the database dies before we write our in-memory hashmap to disk.

SSTables

A sorted string table sorts key-value pairs by key. Merging segments are more efficient since we can just use a Merge Sort to do so. To find a particular value for a key, we still keep hash indexes of some keys in memory and then use Binary Search to find the exact key bit offset we were looking for.

LSM-Trees

A log structured merge tree is a tree of keys mapping to lists of ID’s where each ID corresponds to a document that a key (or word) is found in.

B-Trees

These trees keep key-value pairs sorted by key. B-Trees break the database down into fixed-sized blocks which aligns with hardware, and each page point to another page with a reference with a key range in between the leftmost and rightmost keys surrounding that particular pointer. The tree is always balanced and queries in O(logn) time.