Back to index

# How Useful is Old Information?

Michael Mitzenmacher, UCB/DEC SRC

One-line summary: For systems relying on imperfect/stale/periodic load balancing info to make scheduling decisions, "deterministic" heuristics that choose the lightest loaded server do poorly (even for "continuously updated but stale" load info), but a small amount of randomness (e.g. choosing the lighter loaded of 2 servers determined uniformly at random) suffices to break the symmetry and give performance comparable to that possible with perfect information.

## Overview/Main Points

• Assumptions:
• task service time is exponential with mean 1
• n servers are available; the infinite system (limit as n --> infinity) is studied
• there is a bulletin board of (possibly stale) server load info, which is the only source of load info.
Goal is to minimize time a task spends in the system.
• Three strategies considered: (the names are my own)
• "Random": Choose a server uniformly at random
• "d-random:" Choose d servers uniformly at random and go to the one with the smallest load
• "Lightest": Go to server with lightest load
• Development of the model with periodic updates at times kT ("phases") for integers k
• Differential eqns yield a fixed trajectory of the system, given initial conditions; in fact, they describe the behavior of the system in the limit of an infinite number of servers. "Can be proven with standard (albeit complex) methods."
• Fixed cycle: because of discontinuities in state at each kT (new server load info arrives), the system can't converge to a fixed point, but it can converge to a fixed cycle, in which the fraction of servers with load i at the beginning of a phase is the same as the fraction at the end of the phase.
• In author's simulation of differential equations, the system converged to a fixed cycle in all cases, but no proof of why this is.
• Experiments also suggest that "infinite system" model is very accurate even for simulation of small (n<10) systems.
• Simulations
• Surprise! "Lightest loaded server" strategy is bad unless T (update interval) is very small, due to "herd behavior" (all tasks arriving this interval go to same subset of servers). This is what Yatin discovered experimentally.
• Continuous-update system: load info is continuously updated, but is X seconds stale for some constant X. Behavior is similar to periodic-update system.
• Surprise! If X is instead an exponentially-distributed (or even uniformly-distributed) random variable, the randomness breaks the symmetry and results in much nicer behavior for all 3 scheduling strategies. (In practice, d=2 is sufficient for the d-best strategy.)
• In particular, a large uniform interval, or an exponential distribution, works much better than a smaller uniform interval; suggesting that it's useful to have some tasks get extremely accurate info.
• A general lesson: some queueing-theory arguments prove a result for an exponentially distributed random variable, and assume the result is true for a constant random variable; in systems like this involving imperfect information, this is not so.

## Relevance

Direct relevance to TranSend and a variety of similar systems that rely on soft/stale (BASE) state for load balancing.

## Flaws

• Arrivals aren't Poisson...what's the impact of this?

Back to index