Back to index
The New Routing Algorithm for the ARPANET
John M. McQuillan, Ira Richer, Eric C. Rosen
One-line summary:
Shortest-path-first (based on delay) routing trees are constructed and
updated to direct routing, with various precautions taken to insure that
all nodes' databases are consistent.
Overview/Main Points
  -  Problems with older algorithm:
       
         -  Entire routing tables were transmitted; long packets
              screwed up other traffic
         
 -  Each node used local info only to make routing decisions,
              so inconsistencies in routing tables might develop
         
 -  Lack of stability: sometimes too slow to respond to
              connectivity changes, sometimes overreacts for minor
              changes
         
 -  Queue length is a poor indicator of line delay and poor
              predictor of packet latency
       
 
   -  New algorithm: build tree of the network using shortest path
       first; updates affect subtrees and are propagated only to nodes
       in those subtrees.
  
 -  Takes 10 sec to take a measurement (by averaging many packet
       delays), so this is an upper bound
       on how frequently updates can happen
  
 -  Transient loops possible, but in practice appear to be short
       lived (only for packets already in transit when an update is
       received)
  
 -  New node additions or node removals are implicitly recognized as
       connectivity changes
  
 -  Updates are only observed if the newly measured average delay
       differs from the previous value by some threshold T; if they
       don't, T is reduced by d.  Initially they set T=64ms, d=12.8ms.
       Goal was to allow system to react quickly to large changes and
       slowly to small ones.
  
 -  Updates sent by "flooding", i.e. transmit on all lines except the
       one on which it was received.
  
 -  Consistency/stability: goal was to insure all nodes have the same
       routing info database.  The collection of safeguards seems to
       suggest a remarkably brittle system:
       
         -  Each node must generate at least 1 update per minute.
         
 -  New node cannot come up till it has heard updates from all
              other nodes.
         
 -  Serial numbers on updates are used to prevent "stale"
              updates from being recognized when they are finally
              delivered to a node that was temporarily down.
         
 -  Network partition conditions cannot be lifted until
              updates from each partition have flowed into the other.
       
 
   -  Performance: SPF running time is proportional to size of affected
       subtree; but average subtree size is equal to average hop length
       from root to all nodes.  Therefore, running time grows with
       netork hop length, which generally grows slowly ((log N)/(log c)
       for N-node network with uniform connectivity c>2).
  
 -  Testing: stress the algorithm, running it on the actual ARPANET!
       Can't test algorithms that way these days!  (Authors write that
       their experiments can't be called "scientific" since other users
       were using the network at the same time!)
  
 -  Deficiency of old algorithm remedied by new one: used to take
       several seconds to resolve a routing loop or pathology, now it
       takes msecs.  Compare with Paxson's
       measurements  of modern Internet!
  
 -  Old algorithm took less memory, but authors assert that's not
       very important.  Perhaps so, in the days when the dedicated
       routers were the IMP monstrosities.
 
Relevance
The humble beginnings of modern routing-update protocols.  Like other
papers from the early days (except for the Cerf and Kahn paper), it's
somewhat jumbled and hard to read.
Flaws
  -  Scalability: each node maintains full network map, and updates
       are broadcast.
 
Back to index