Issue #4 - Distributed Storage Engine
Distributed SQL Database Internals—Distributed Storage Engine
Another weekend, another weekend read, this time all about Distributed Storage Engines
My book Thinking in Distributed Systems is now available. 12 chapters, full of mental models, diagrams, illustrations, and examples, all about distributed systems.
In his blog post Distributed SQL Database Internals—Distributed Storage Engine, Li Shen explores the distributed storage engines in TiKV, a distributed key value store.
The TiKV distributed storage engine manages data across nodes in a network, it does not manage data on a node, that responsibility is outsourced to RocksDB.
TiKV is conceptualized as a vast, ordered Key-Value Map. To achieve scalability, it’s necessary to partition data across multiple machines. To achieve reliability, it’s necessary to replicate data across machines.
Logically, TiKV partitions data based on sequential range of keys called regions. Each region is stored on one node. Once data is segmented into Regions, two crucial tasks are undertaken:
Distribution
Replication
Each Region has multiple copies, known as Replicas. To ensure consistency among Replicas, TiKV employs the Raft consensus algorithm. Replicas belonging to a single Region, are stored across different nodes, with one Replica designated as the leader, while the others act as followers.
After discussing the distributed storage engine, the author proceeds to discuss TiKV’s implementation of distributed transactions and multi version concurrency control.
An informative read, exploring the world of distributed, partitioned, and replicated storage engines through the lens of TiKV
Happy Reading
Distributed Storage Engine
Li Shen
The storage engine is responsible for managing how the data is stored, organized, and accessed on a disk or other permanent storage media. It plays a crucial role in the overall architecture and performance of the database, influencing how data is written, read, and maintained. For distributed databases, the storage engine should also be distributed. Distributed storage engines manage data across multiple nodes in a network, often spanning different geographical locations.


