Time, clocks and the ordering of events in a distributed system (not on the reading list) Lamport 1978 Summary by Ed Swierk This seminal paper introduces several important concepts in distributed systems: - partial ordering of events based on the notion of potential causality - an algorithm for extending the partial ordering into a consistent total ordering (called causal consistency in subsequent literature) - the "anomalous behavior" caused by communication external to the system, and the solution based on physical (real-time) clocks - the characteristics of a sufficiently consistent set of real-time clocks The "happened before" relation (->) between two events is based on the notion that it is possible for one event to causally affect the other: (1) If A and B are events in the same process, and A comes before B, then A -> B. (2) If A is the sending of a message by one process and B is the receipt of the same message by another process, then A -> B. (In a shared memory system, A would be a write operation and B would be a read operation.) (3) If A -> B and B -> C then A -> C. Two distinct events A and B are said to be concurrent if A !-> B and B !-> A. The -> relation defines a partial order on the events in a distributed system. A logical clock is just a way of assigning a number C(A) to an event A. For any events A and B, A -> B implies that C(A) < C(B). (The converse condition does not hold, since that would imply that any two concurrent events must occur at the same time.) This "clock condition" is satisfied if the following hold: (C1) If A and B are events in process P[i], and A comes before B, then C[i](A) < C[i](B). (C2) If A is the sending of a message by process P[i] and B is the receipt of that message by process P[j], then C[i](A) < C[j](B). To guarantee the clock condition, the following implementation rules can be followed: (IR1) Each process P[i] increments C[i] between any two successive events. (IR2) - If event A is the sending of a message by a process P[i], then the message contains a timestamp C[i](A). - Upon receiving the message, process P[j] sets C[j] greater than or equal to its present value and greater than the timestamp. A system obeying these implementation rules imposes a total ordering on the events, which is simply the times at which the events occur. Ties can be broken arbitrarily. The paper identifies one kind of "anomalous behavior," caused by messages sent external to the system (for example, by Alice reading a value and telling it to Bob, who then writes a value). The -> relation does not hold on the read event A and the write event B, because the system doesn't know that one caused the other. But the application's semantics may still demand that A precede B. To satisfy the clock condition, we can turn the logical clocks into physical clocks by synchronizing them to real time. Of course clocks can't be kept perfectly synchronized, but a clock is close enough if E / K <= M, where: E = the maximum offset between any two clocks in the system K = the maximum drift of any clock in the system (where drift is 1 - the ratio of clock time to real time) M = the shortest transmission time for interprocess messages Also, we assume that a clock's value never decreases. The paper gives an algorithm for implementing the clock synchronization protocol, which satisfies the clock condition and thus guarantees that the anomalous behavior is impossible.