Packet Loss Correlation in the MBone Multicast Network

Maya Yajnik, Jim Kurose, and Don Towsley, UMass Amherst

One-line summary: Losses on backbone or between NI and receiving process are rare; the average loss event is a single packet, but the average lost byte lives in a burst packet loss; most packets are dropped close to the source.

Overview/Main Points

MBone is a two-level "mesh of stars". Main backbone routers operated by long-haul service providers are connected by redundant unicast tunnel routes; each backbone router is the hub of a star of unicast tunnels to local mcast-aware routers.
Data collected by counting dropped packets for specific MBone sessions at multiple receiver sites, then comparing the loss traces.
Expected packet loss along a link:
- Suppose A is a backbone router downstream of B, and a packet is seen at all of B's regional routers, but not at any of A's. Then the packet either got lost between B and A, or else every copy independently got lost between A and its regional routers.
- If Na and Nb are the number of packets lost by all receivers downstream of A and B respectively, then Na-Nb is the number of packets seen at B's regional routers but not at A's, so the expected probability of loss along link AB is (Na-Nb)/(N-Nb) where N is total number of packets sent by source.
- Applying this to the top-level MBone hierarchy graph based on the data collected, the inter-backbone-router links have expected loss probabilities <<1% (except for a bottleneck link leaving UCB, which has ELP about 5%, and the transatlantic link from U. Maryland to France, about 6.5%).
End-host loss (i.e. loss in the network stack) was measured by correlating losses from hosts connected to same local mcast router, and found to be negligible.
Spatial correlation among receivers that simultaneously lose a given packet:
- 47% of all packets sent were dropped by at least one receiver!
- Temporal independence of loss is assumed.
- Metric: if m is the number of receivers simultaneously experiencing loss, and M is the number of receivers that simultaneously lose a given packet, compute probability P(M=m).
- Expected probability was computed for star (packet source is hub of star), modified star (packet source is spoke of star), and "full" (spatially correlated, as in backbone example given above), in each case with 11 receivers (the number of receivers instrumented for data collection).
- The distribution of M for "full" and modified-star topologies were closest to the measured data. These both have the property that if a loss occurs close to the source, all receivers will experience that loss, suggesting that this is a dominant MBone lossage mode.
Spatial loss correlation between receiver pairs:
- Average correlation coefficient varies between .271 and .666.
- But, when the loss that is common to all receivers is removed from dataset, correlation practically vanishes!
- Again, conclusion is that spatial pairwise correlation is rare, except for the effects of loss close the source.
Temporal loss correlation at a single receiver:
- Solitary (single-packet) losses predominate, but
- packets lost due to long burst losses constitute a significant fraction of total bytes lost.
- Periodic losses seen every 30 seconds; a paper by Floyd and Jacobson is cited (The Synchronization of Periodic Routing Messages) to explain it.
- Compare: "The average file is small, but the average byte lives in a big file."

Relevance

Many proposed mcast error control protocols recover by interacting with nearby receivers rather than requesting source retransmission, so it's worth profiling whether loss patterns make it reasonable to expect that a nearby receiver will have gotten the packet if you didn't get it. (Paper concludes that it is reasonable to expect this.)

Back to index