Distributed Systems

Scaling to a Distributed Data system introduces substantial overhead. Don’t over-engineer. Make sure that you are designing the system that you need.

These are different from programs running on a single computer in multiple ways. Namely:

No shared memory
Messages pass via an unreliable network with variable delays
The systems may suffer from partial failures, unreliable clocks, processing pauses
Packets can be lost, reordered or duplicated
Clocks are approximate at best
Nodes can pause or crash all together

The thing that makes distributed systems difficult (and interesting) is the ambiguity. Discussions of these systems are philosophical:

What do we know about our system for sure, and how can we be certain?
How can we be certain of knowledge if our mechanisms for measurement and perceptions and unreliable?

You can distribute data 2 ways across a node:

Solutions to Distributed Issues