There are three main approaches to replication:
- Single-leader replication
- Clients send all their writes to a single leader which sends a steam of data to other replicas. Any replica can be read from
- Multi-leader replication
- Clients send writes to one of several leaders (usually the leader of the client’s closest data center). Then the leaders send streams of data change events to each other and their followers.
- Leaderless replication
- Clients send writes to several nodes and read from several nodes in parallel to detect stale data and correct noes.
Single leader replication is the most common, but the single leader can be a bottleneck and can make systems less robust because of the need for failover. Replication can be async or sync where async can cause things like replication lag. We handle this using:
- Read after write consistency
- Monotonic reads
- Consistent prefix reads
One issue that we run into with replication are concurrency issues where two clients write at effectively the same time to two different replicas. There are a couple ways we resolve this:
- The easiest is the last write wins pattern but this effectively deletes data we may want to keep The other is this algorithm:
- The sever maintains a version number for every key, increments the version number ever time a key is written and stores a new version number along with the written value
- When a client reads a key, the server returns all values that have not been overwritten as well as the latest version number. A client must read a key before writing.
- When a client writes a key, it must include the version number from the prior read, and it must merge together all values that it received in the prior read.
- When the server receives a write with a particular version number, it can overwrite all values with that version number or below (since it knows that they have been merged into the new value), but it must keep all values with a higher version number (since these are concurrent values with incoming writes)