Fault Tolerance

To tolerate faults, the first step is to detect them. Most systems don’t have an accurate mechanism of detecting whether a node has failed so most distributed algorithms rely on timeouts to determine if a node is still available. Also, variable network delays sometimes causes a node to be falsely suspected of crashing.

Once a fault is detected, making a system tolerate it is not easy due to no global variables, shared memory, or common knowledge.

🤖 Dan Huynh

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Causality

Quorum Reads and Writes

Explorer

Fault Tolerance

Graph View

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Backlinks