Characterizing End-to-End Packet Delay In the Internet

Jean-Chrysostome Bolot, INRIA

One-line summary: UDP roundtrip measurements and some clever analysis based on queueing theory show that end-to-end delays and packet losses are correlated at short time scales, random at long time scales.

Overview/Main Points

UDP ping-ponged between INRIA and UMD. Each has unique timestamp and serial number. Only roundtrip times were measured.
Analysis strategy: Much is known about system under study, so don't use plain autoregressive or moving-average models to study RTT's in isolation; instead, use a model of two streams entering a FIFO server queue, where one stream with a fixed delay models the fixed part of the probe RTT, the other stream is randomly-distributed Internet traffic, and the server models the variation in probe RTT.
For low traffic, W_n+1=W_n+e_n, where W_n is waiting time of packet n, e_n is exponential random variable with small variance and mean=0.
- Points will lie along the diagonal rtt_n+1=rtt_n in phase plot.
When d (interarrival time of probes) is small and other traffic is B (which is very heavy), then W_n+1=W_n+(B/m) where m is queue service rate, or rtt_n+1-rtt_n=B/m. If no Internet traffic is received betwen probe packets n+1 and n+k, then probe packets n+1 thru n+k will leave the queue P/m seconds apart, so that rtt_n+2-rtt_n+1=P/m - d.
- So points will lie along that "lower" diagonal on phase plot.
When d is very large, the maximum queueing delay is barely larger than the interarrival times, and the points are scattered all over the phase plane.
Using the Lindley recurrence W_n+1=W_n+Y_n-X_n (where Y is service time and X is interarrival time; see "graphical proof" in paper):
- Assume that Internet traffic contributes b bits during slot n (i.e. time interval [nd,(n+1)d]).
- Apply Lindley's recurrence twice and substitute, to get W_n+1=W_n + (P+b)/m - d, or rewriting, b_n=m(W_n+1-W_n+d). That is, we can estimate Internet traffic from the distribution of packet interarrival times!
- Note that above property doesn't hold if the queue does not get a chance to empty during the time slot [nd,(n+1)d]. So we must make d small enough, i.e. make m*d smaller than some average value of b.
Applying this analysis to the observed distribution of the W_n for d=20ms, we can see probes that got stuck behind a single FTP packet, probes that got stuck behind two FTP packets, etc.
As d gets bigger, correlations between successive packets get worse and graph loses some structure. (For d=20ms, it's clearly multimodal; for d=100ms, it's noisier.)
Packet loss analysis: For large values of d, the unconditional and conditional packet loss probs approach each other. I.e. losses are random at large time scales, but reasonably well correlated (ulp and clp are quite distinct) at shorter scales.
Unexplained result: even for enormous d, ulp is observed to be as high as 10%.
Loss gap stays close to 1 even for large values of d! So, e.g., audio apps should rely on FEC to fix most lost-packet errors.

Relevance

Another neat trick of indirect observation plus a clever model to deduce end-to-end behavior properties for a large system.

Flaws

Internet traffic offered during each queue slot is modeled as exponential random variable. Is this reasonable?

Back to index