Back to index

Distributed Snapshots: Determining Global States of Distributed Systems

K.M. Chandy (UT Austin) and Leslie Lamport (SRI)

One-line summary: Elegant formal characterization of distributed computation, which supports a proof that the proposed snapshot algorithm captures something that can meaningfully be called "global state" of a DS. Also shows how this algorithm can be used to detect stable properties, e.g. deadlock.

Overview/Main Points

Important definition/assumption (my paraphrase): the only ordering that can be imposed on a distributed computation is that resulting from treating send/receive of messages as events tht modify the state of a single process or comm. channel. Any permutation of the events that respects this partial ordering results in the system performing "the same" distributed computation. Read the paper for the formal defs. Global state detection algorithm (any process may initiate at will): Properties of the recorded global state: Stable property detection:

Relevance

A particular formal characterizaton of distributed computation, an algorithm for recording global state, proof that the recorded state is meaningful, and proof that stable property detection can be done this way. Formal characterizations always make me feel better.

Flaws

Ordering criteria used as assumptions (i.e. based only on message sends/receives) may not be appropriate for all DS's.
Back to index