Back to index
The New Routing Algorithm for the ARPANET
John M. McQuillan, Ira Richer, Eric C. Rosen
One-line summary:
Shortest-path-first (based on delay) routing trees are constructed and
updated to direct routing, with various precautions taken to insure that
all nodes' databases are consistent.
Overview/Main Points
- Problems with older algorithm:
- Entire routing tables were transmitted; long packets
screwed up other traffic
- Each node used local info only to make routing decisions,
so inconsistencies in routing tables might develop
- Lack of stability: sometimes too slow to respond to
connectivity changes, sometimes overreacts for minor
changes
- Queue length is a poor indicator of line delay and poor
predictor of packet latency
- New algorithm: build tree of the network using shortest path
first; updates affect subtrees and are propagated only to nodes
in those subtrees.
- Takes 10 sec to take a measurement (by averaging many packet
delays), so this is an upper bound
on how frequently updates can happen
- Transient loops possible, but in practice appear to be short
lived (only for packets already in transit when an update is
received)
- New node additions or node removals are implicitly recognized as
connectivity changes
- Updates are only observed if the newly measured average delay
differs from the previous value by some threshold T; if they
don't, T is reduced by d. Initially they set T=64ms, d=12.8ms.
Goal was to allow system to react quickly to large changes and
slowly to small ones.
- Updates sent by "flooding", i.e. transmit on all lines except the
one on which it was received.
- Consistency/stability: goal was to insure all nodes have the same
routing info database. The collection of safeguards seems to
suggest a remarkably brittle system:
- Each node must generate at least 1 update per minute.
- New node cannot come up till it has heard updates from all
other nodes.
- Serial numbers on updates are used to prevent "stale"
updates from being recognized when they are finally
delivered to a node that was temporarily down.
- Network partition conditions cannot be lifted until
updates from each partition have flowed into the other.
- Performance: SPF running time is proportional to size of affected
subtree; but average subtree size is equal to average hop length
from root to all nodes. Therefore, running time grows with
netork hop length, which generally grows slowly ((log N)/(log c)
for N-node network with uniform connectivity c>2).
- Testing: stress the algorithm, running it on the actual ARPANET!
Can't test algorithms that way these days! (Authors write that
their experiments can't be called "scientific" since other users
were using the network at the same time!)
- Deficiency of old algorithm remedied by new one: used to take
several seconds to resolve a routing loop or pathology, now it
takes msecs. Compare with Paxson's
measurements of modern Internet!
- Old algorithm took less memory, but authors assert that's not
very important. Perhaps so, in the days when the dedicated
routers were the IMP monstrosities.
Relevance
The humble beginnings of modern routing-update protocols. Like other
papers from the early days (except for the Cerf and Kahn paper), it's
somewhat jumbled and hard to read.
Flaws
- Scalability: each node maintains full network map, and updates
are broadcast.
Back to index