Distributed Snapshots: Determining Global States of Distributed Systems

K.M. Chandy (UT Austin) and Leslie Lamport (SRI)

One-line summary: Elegant formal characterization of distributed computation, which supports a proof that the proposed snapshot algorithm captures something that can meaningfully be called "global state" of a DS. Also shows how this algorithm can be used to detect stable properties, e.g. deadlock.

Overview/Main Points

Important definition/assumption (my paraphrase): the only ordering that can be imposed on a distributed computation is that resulting from treating send/receive of messages as events tht modify the state of a single process or comm. channel. Any permutation of the events that respects this partial ordering results in the system performing "the same" distributed computation. Read the paper for the formal defs. Global state detection algorithm (any process may initiate at will):

p sends marker m along each channel c that it communicates on (or multicast, etc.)
When I receive m on channel c:
- If I have not recorded my state, then record state, mark the state of c as the empty sequence, and forward marker m.
- Otherwise, record state of c as sequence of messages received on c since I recorded my state.
Simple to prove that algorithm terminates in finite time, if communication graph is strongly connected and channels have finite latency.

Properties of the recorded global state:

It does not necessarily correspond to any "real" global state of the system (i.e. one that ever happened at a given global time)
However, it does correspond to a possible global state that is consistent with the ordering imposed by message sends and receives.
Formally: If the state recorded is S*, the sequence of distributed computations done by the system is seq, and the "true" initial and final states of the system are Si and Sf, then:
- S* is reachable from Si, and Sf is reachable from S*;
- There exists a computation seq* which is a permutation of seq,
- Either S*=Si or Si<S* in seq* (< means "occurs earlier than"),
- Either Sf=S* or S*<Sf in seq*.
Proof proceeds by distinguishing prerecording from postrecording events in each process, and showing that seq* can be constructed from seq without changing the "ordering-ness" of subsequences of seq.

Stable property detection:

Let y(S) be a function that returns true iff global state S exhibits some stable property.
Note that just because y(S) is false at the beginning of the state-detection algorithm, it may not be false at the termination of the algorithm.
Algorithm: Record some global state S*; then y(S*) will return true iff the stable property holds, since
- S* is reachable from Si (shown in proof),
- Sf is reachable from S* (shown in proof),
- y(S) implies y(S') for all S' reachable from S (definition of a stable property of a DS).

Relevance

A particular formal characterizaton of distributed computation, an algorithm for recording global state, proof that the recorded state is meaningful, and proof that stable property detection can be done this way. Formal characterizations always make me feel better.

Flaws

Ordering criteria used as assumptions (i.e. based only on message sends/receives) may not be appropriate for all DS's.

Back to index