End-to-End Routing Behavior in the Internet

Vern Paxson, LBL

One-line summary: Wide-area routing behavior in the Internet has become less predictable with respect to short- and long-scale route changes, routing asymmetry, and routing anomalies (loops, bad routes, etc).

Overview/Main Points

Two sets of end-to-end traceroute measurements taken between pairs of 37 hosts, analyzed for:
- routing pathologies
- routing changes ("fluttering")
- routing asymmetry
D1 measurements: traceroute between two sites with mean interval 1-2 days. D2 measurements: 60% of measurements with 2 hour mean inter-measurement time, 40% with 2.75 days mean inter-measurement time.
Arguments supporting the methodology and goals:
- even tho few sites, routes are representative because they include non-negligible fraction of AS's (autonomous systems) comprising the Internet.
- Stable inter-AS routing doesn't imply stable e2e routing, since AS's are large entities and can have internal inconsistencies.
Properties of the measurements:
- Additive random sampling
- Poisson process; therefore PASTA holds (Poisson Arrivals See Time Averages), so percentage of measurements that observe a given state is asymptotically proportional to the percentage of time the Internet spends in that state, even if the process is non-Markovian and the Poisson arrivals are non-homogeneous (Wolff 82). This property is used extensively in the analysis.
- Caveat: in the event of (temporary) lost connectivity, network can anticipate that no measurement will occur during the next traceroute; but PASTA requires that the observed process not be able to anticipate observation arrivals. Effect of this is tendency to underestimate prevalence of connectivity problems.
- Caveat: end-to-end measurement can agglomerate many intermediate effects into a single observation, with no information about where/why the problem occurred. Sometimes author called AS administrators to track down.
All measurements reported to 95% confidence interval, including measurements for which it was necessary to establish that differences between two sets of data are not due to chance.
Routing loops and anomalies. Apt to form when network connectivity changes are not immediately propagated to routers.
- "Persistent loop" == unresolved by the end of traceroute. More prevalent in Wash. DC area, where many long-haul providers interchange packets.
- "Temporary loop" == resolved during traceroute; usually transient effect of a single change rippling through the network.
- Observation: All routing loops were confined to a single AS, so the Border Gateway Protocol's loop-suppression mechanism is working well in practice.
- Erroneous routing: one instance seen. Security implication: can't assume where your packets are going, but that's not news.
- Unreachable due to low TTL: operational diameter of the Internet is now >30 hops, but variance in hop count is high and doesn't seem to correlate with geographical distance.
- Observation: Time of day effects: time of day computed as mean time-of-day between measured sites. Clear correlation between better connectivity and less loaded hours.
Routing changes ("fluttering").
- Most egregious: wustl load balancing shunts alternate packets to either coast.
- Inter-AS fluttering creates stability and symmetry problems. Intra-AS load balancing does not.
- Observation: "Deflection routing" schemes, which shunt load rather than drop packets, have virtually the same properties as inter-AS fluttering. Don't use them.
Argument: pathology measurements are representative; top 3 AS"s accounted for nearly half of them.
Observation: None of the pathologies improved between D1 and D2 (several months apart), and some got worse; so wide-area Internet predictability is getting worse.
Route prevalence and persistence. Three granularities: hostname, city, sequence-of-AS's.
- Prevalence: how likely is this same route to be observed in the future? vs. Persistence: how long will this route stick around before it's replaced by a different route?
- Analysis focuses on dominant route, the one observed most often between site pair.
- Assumption: routing changes are semi-Markov, so that steady-state probability of observing a state is asymptotically equal to amount of time spent in that state. Because of PASTA, measurements provide this time average.
- Observation: Internet paths dominated by a single route, but a fairly wide spread is observed. ("heavy headed and heavy tailed" :-))
- To measure route persistence, must rule out short-time-scale route fluttering. 2-minute route changes are negligible; 10-minute route changes are rare but non-negligible. On larger scales, route changes seem to happen at 12-48 hour granularity.
- Observation: Routing persistence at least bimodal; routes that do not change on short, medium or long scales as defined above, tend to be stable (90% chance) for at least a week.
- Argument: routing changes on short time scales (< days) happen inside the network, not at stub networks. Therefore our measurements are likely to be similar to those observed by other (unmeasured) sites.
Route symmetry. Important since some protocols estimate latency as 1/2 of RTT (eg NTP), or infer network conditions from packet interarrivals.
- In D2, measurements were paired, so asymmetries could be unambiguously determined. 49% measurements observed asymmetry at the level of cities.
- D1 analyzed to conservatively (ie under-)estimate asymmetry. Observation: things have gotten worse (30% to 49%).
Overall observations.
- Chance of seeing some routing pathology doubled from 1.5% to 3.4% from end 1994 to end 1995.
- Most paths dominated by single route, but time periods over which routes persist show wide variation.
- Getting harder to provide consistent topological view of the wide-area Internet. There is no "typical" Internet path.

Relevance

A tour de force of data reduction and analysis; a surprising amount was deduced using only sparse end-to-end measurements.
The routing asymmetries and route change behavior can affect the performance of various network time and anti-congestion algorithms.

Flaws

Needed to make a few assumptions, e.g. route changes are semi-Markovian and the applicability of the PASTA principle (Poisson Arrivals See Time Averages).

Back to index