On the Self-Similar Nature of Ethernet Traffic ---------------------------------------------- Leland et al., SIGMETRICS '93 Self-similar phenomena (first defined by Mandelbrot in '65) display statistically identical structure across all time scales. In the case of Ethernet, this manifests as the lack of a "typical" burst length. The authors wrote a custom monitoring system to collects all seen Ethernet packets, along with accurate timestamps. One doesn't need to make any a priori decisions regarding their experiment, except the amount of saved data per packet. They got 4 data sets totaling on the order of 10^8 packets. Set #1 was collected on an isolated LAN with diskless workstations; set #2 came 2 months later, after a large upgrade. In both these sets, 95% of the traffic was intra-LAN. 3 months later they monitored an inter-LAN that served as an external connection as well. After 2 years, they got set #4 by monitoring a backbone segment (thus, mostly inter-router traffic). They found that real Ethernet traffic at Bellcore is very different from what is predicted by formal models for packet traffic, packet train models, and fluid flow models. Obviously, it's also very different from isochronous traffic typical in the telephone system. Ethernet traffic was highly self-similar, whereas all the models would predict white noise-like distributions for aggregated traffic. The Hurst parameter (H) is a measure of self-similarity. When H=0.5 you get white noise [?], but for Ethertraffic it's 0.75-0.85 and H goes up as utilization of the network increases (i.e., traffic becomes increasingly self-similar). An important consequence is that, as the number of Ethernet users increases, the traffic does not become smoother (as predicted by formal models), but rather it becomes even burstier. Mandelbrot showed how self-similar processes can be viewed as an aggregation of many simple renewal reward processes. The renewal rewards for each such process turns out to be the amount of traffic generated by a single user during successive time intervals, described by a heavy tailed distribution (i.e., no characteristic length of busy period or packet train). Common measures of burstiness include: - index of dispersion (variance in number of arrivals / expected value of arrivals over a given time interval); this index increases monotonically for Ethertraffic, but stays constant or converges rapidly in the formal models. - peak-to-mean ratio: inappropriate measure for self-similar traffic, because it could literally have any value, depending on the time interval you're looking at. - coefficient of variation (standard deviation of interarrival times / expected value of interarrival times): again, any value is possible, depending on number of samples. Switched Megabit Data Service (SMDS) buffers and caps the rate at which traffic is delivered into a LAN, to reduce burstiness. As it turns out, increasing the buffer size reduces overall packet loss only very slowly (in contrast to the exponential decrease predicted by Poisson models) and invariably increases packet delay (in spite of the limit predicted by formal models). This is all explained by self-similarity, which further explains the observation that buffering doesn't help manage congestion (losses are concentrated and greatly exceed the background loss rate). The only tractable models that are reasonably accurate: self-similar stochastic model and deterministic chaotic maps. Discussion issues: 1. CSMA/CD <--> nature (remember classroom example) log-log plots... 1/f noise: cities by rank, fjords by size, Etherbursts, fractals, English lang by rank, WWW traffic, co. revenue as func of rank 2. Why is this relevant/irrelevant to Van Jacobson's paper?