Scaling to a Distributed Data system introduces substantial overhead. Don’t over-engineer. Make sure that you are designing the system that you need.
These are different from programs running on a single computer in multiple ways. Namely:
- No shared memory
- Messages pass via an unreliable network with variable delays
- The systems may suffer from partial failures, unreliable clocks, processing pauses
- Packets can be lost, reordered or duplicated
- Clocks are approximate at best
- Nodes can pause or crash all together
The thing that makes distributed systems difficult (and interesting) is the ambiguity. Discussions of these systems are philosophical:
- What do we know about our system for sure, and how can we be certain?
- How can we be certain of knowledge if our mechanisms for measurement and perceptions and unreliable?
You can distribute data 2 ways across a node: