Back to index
How Useful is Old Information?
Michael Mitzenmacher, UCB/DEC SRC
One-line summary:
For systems relying on imperfect/stale/periodic load balancing info to
make scheduling decisions, "deterministic" heuristics that choose the
lightest loaded server do poorly (even for "continuously updated but
stale" load info), but a small amount of randomness (e.g. choosing the
lighter loaded of 2 servers determined uniformly at random) suffices to
break the symmetry and give performance comparable to that possible with
perfect information.
Overview/Main Points
- Assumptions:
- task arrivals are Poisson
- task service time is exponential with mean 1
- n servers are available;
the infinite system (limit as
n --> infinity) is studied
- there is a bulletin board of (possibly stale)
server load info, which is the only source of load info.
Goal is to minimize time a task spends in the system.
- Three strategies considered: (the names are my own)
- "Random": Choose a server uniformly at random
- "d-random:" Choose d servers uniformly at random and
go to the
one with the smallest load
- "Lightest": Go to server with lightest load
- Development of the model with periodic updates at times kT
("phases") for integers k
- Differential eqns yield a fixed trajectory of the system,
given initial conditions; in fact, they describe the
behavior of the system in the limit of an infinite number
of servers. "Can be proven with standard (albeit
complex) methods."
- Fixed cycle: because of discontinuities in state at each
kT (new server load info arrives), the system can't
converge to a fixed point, but it can converge to a
fixed cycle, in which the fraction of servers with
load i at the beginning of a phase is the same as
the fraction at the end of the phase.
- In author's simulation of differential equations, the
system converged to a fixed cycle in all cases, but no
proof of why this is.
- Experiments also suggest that "infinite system" model is
very accurate even for simulation of small (n<10) systems.
- Simulations
- Surprise! "Lightest loaded server" strategy is bad unless
T (update interval) is very small, due to "herd behavior"
(all tasks arriving this interval go to same subset of
servers). This is what Yatin discovered experimentally.
- Continuous-update system: load info is continuously
updated, but is X seconds stale for some constant X.
Behavior is similar to periodic-update system.
- Surprise! If X is instead an exponentially-distributed
(or even uniformly-distributed) random variable, the
randomness breaks the symmetry and results in much nicer
behavior for all 3 scheduling strategies. (In practice,
d=2 is sufficient for the d-best strategy.)
- In particular, a large uniform interval, or an exponential
distribution, works much better than a smaller uniform
interval; suggesting that it's useful to have some tasks
get extremely accurate info.
- A general lesson: some queueing-theory arguments prove a result
for an exponentially distributed random variable, and assume the
result is true for a constant random variable; in systems like
this involving imperfect information, this is not so.
Relevance
Direct relevance to TranSend and a variety of similar systems that rely
on soft/stale (BASE) state for load balancing.
Flaws
- Arrivals aren't Poisson...what's the impact of this?
Back to index