A Probabilistic Approach to Distributed Clock Synchronization

Flavia Cristian

One-line summary: Provable, low bounds on clock synchronization precisions are easily obtained with a relatively simply protocol and trivial computation.

Overview/Main Points

Fundamentals: The fundamental protocol is for a process P to send a query message to process Q; process Q replies to P with a message containing Q's clock time at the instant Q received P's message. We assume hardware clocks have drift rate , implying the measurement of a time interval t'-t has an error of at most (t-t'). Assume further that the minimum message transmission delay between P and Q is min.
Clock synch and max error: It is proven in the paper that if P measures the round-trip time of the request/reply pair as 2D, the value displayed by Q's clock when it sends its reply is T, and the value displayed by Q's clock when P receives this reply is , then the best guess by P for is:
with maximum error
.
Obtaining a desired precision: The precision of the clock estimate is limited by the measured round-trip time of the reply/request pair. By timing out any requests that don't receive replies within 2U seconds, one can bound the imprecision of a given clock (at the risk of requiring one to resend a clock resynchronization request) to be e by by choosing a U of: . Because the minimum U one can pick is (because of the minimum delivery latency plus clock drift), the best precision achievable by this method is .
Maintaining a desired precision: Because the local clock drifts, it must be resynchonized periodically to maintain a desired precision. Because delivery latency is potentially unbounded, one must set a reply/request timeout, which implies a resynchronization request could fail. One can calculate the probability of N successive request failures; given a current precision, and a specifiable probability of success and a specifiable desired precision, one can calculate the delay between the periodic resynchronization attempts.
Detecting failures: Hard bounds on possible imprecisions are provable with this scheme. If a calculated clock time falls outside of those bounds, a clock failure (at the master or at the slave) has occured. Because a master has a reliable clock source, the master can detect if its own clock as failed, and since masters update themselves more frequently than slaves, a slave can deduce its own clock has failed if such an erroneous condition occurs two times in a row without the master declaring itself invalid.

Relevance

This paper rigorously shows the design and implementation of a probabilistic clock synchronization protocol. Such a protocol is extremely useful when building distributed systems, although the list of assumptions given in this paper should be read very carefully. Most of them are reasonable in a friendly environment...

Flaws

The only flaw of this paper is that the extremely rigorous presentation belies the fact that if any of the assumptions are violated, then all certainty flies out the window.

Back to index