The New Routing Algorithm for the ARPANET

John M. McQuillan, Ira Richer, Eric C. Rosen

One-line summary: Shortest-path-first (based on delay) routing trees are constructed and updated to direct routing, with various precautions taken to insure that all nodes' databases are consistent.

Overview/Main Points

Problems with older algorithm:
- Entire routing tables were transmitted; long packets screwed up other traffic
- Each node used local info only to make routing decisions, so inconsistencies in routing tables might develop
- Lack of stability: sometimes too slow to respond to connectivity changes, sometimes overreacts for minor changes
- Queue length is a poor indicator of line delay and poor predictor of packet latency
New algorithm: build tree of the network using shortest path first; updates affect subtrees and are propagated only to nodes in those subtrees.
Takes 10 sec to take a measurement (by averaging many packet delays), so this is an upper bound on how frequently updates can happen
Transient loops possible, but in practice appear to be short lived (only for packets already in transit when an update is received)
New node additions or node removals are implicitly recognized as connectivity changes
Updates are only observed if the newly measured average delay differs from the previous value by some threshold T; if they don't, T is reduced by d. Initially they set T=64ms, d=12.8ms. Goal was to allow system to react quickly to large changes and slowly to small ones.
Updates sent by "flooding", i.e. transmit on all lines except the one on which it was received.
Consistency/stability: goal was to insure all nodes have the same routing info database. The collection of safeguards seems to suggest a remarkably brittle system:
- Each node must generate at least 1 update per minute.
- New node cannot come up till it has heard updates from all other nodes.
- Serial numbers on updates are used to prevent "stale" updates from being recognized when they are finally delivered to a node that was temporarily down.
- Network partition conditions cannot be lifted until updates from each partition have flowed into the other.
Performance: SPF running time is proportional to size of affected subtree; but average subtree size is equal to average hop length from root to all nodes. Therefore, running time grows with netork hop length, which generally grows slowly ((log N)/(log c) for N-node network with uniform connectivity c>2).
Testing: stress the algorithm, running it on the actual ARPANET! Can't test algorithms that way these days! (Authors write that their experiments can't be called "scientific" since other users were using the network at the same time!)
Deficiency of old algorithm remedied by new one: used to take several seconds to resolve a routing loop or pathology, now it takes msecs. Compare with Paxson's measurements of modern Internet!
Old algorithm took less memory, but authors assert that's not very important. Perhaps so, in the days when the dedicated routers were the IMP monstrosities.

Relevance

The humble beginnings of modern routing-update protocols. Like other papers from the early days (except for the Cerf and Kahn paper), it's somewhat jumbled and hard to read.

Flaws

Scalability: each node maintains full network map, and updates are broadcast.

Back to index