Back to index
Multicast Routing in Datagram Internetworks and Extended LANs
Steve Deering and David R. Cheriton, Stanford Univ.
One-line summary:
Extensions and refinements to distance-vector and link-state/OSPF
routing to support
efficient multicast tree creation and maintenance in wide area
internetworks. Main tradeoff is low delay (for joining or delivery)
vs. increased cost in bandwidth or router state. Distance-vector
routing seems to scale better.
Overview/Main Points
- Design of service interface: sender need not know membership of
destination group, nor be a member of it.
- Compare Cheriton & Skeen V-System group notions
vs. ISIS!
- Rough classification of multicast groups:
- Pervasive: members all over the place. This type of group
motivates efficient shortest-path delivery tree over
sending many unicast messages. For some groups, eg
directory servers, packet delivery should be constrained
to subset.
- Sparse: few members per local group.
- Local: scope-constrained to this admin. domain or subnet;
can probably exploit NI-level broadcast.
- Main tradeoff is cost of broadcast bandwidth vs. cost of keeping router
state and control programs to do selective multicast.
- Scope control: Boundary routers can refuse to
forward mcast packets outside their domain unless the packets have a
minimum remaining TTL, allowing TTL to be used to scope intra-
vs. inter-domain mcast.
- Basic algorithm as used with LAN bridges:
- If pkt arrives with mcast source addr, record which branch
it arrived on and set its age to zero in mcast address
table, which maps mcast addresses to links along which
there are members of that mcast group.
- If pkt arrives with mcast dest addr, forward it along
appropriate links.
- Periodically increment ages of mcast table records, and
expire them.
- Main costs are bandwidth used for periodic membership
reporting, which is used to construct the tables, and the
space required for tables (1 entry per active mcast group).
- Optimization to reduce this cost: when issuing a
membership report for group G, sender addresses it to G
rather than using a broadcast address. Then other members
of G will overhear it and suppress additional membership
reports containing duplicate info.
- Distance-vector mcast routing: Like basic algorithm, but
each router sends distance-vector packet on
each outgoing link indicating distances to nearest
neighbors. But the Internet is a not a tree, so computing a
single spanning tree over all routers won't result in most
efficient (shortest-path) mcast delivery for all mcast groups.
- Refinement 1: reverse path flooding (RPF)
- Router forwards broadcast packet from S iff it arrives via
shortest known path from S.
- Problem: if multiple routers on a single link, they may
all rebroadcast it.
- Example: routers x and y are on the same LAN, L, but may
be reached by different routes from source S. If they
both hear broadcast packet from source S, they both
rebroadcast the packet.
- Refinement 2: reverse path broadcasting (RPB) to solve
above problem.
- Identify a single closest "parent" router for each link relative
to source S. Use address numbers to break ties.
- In example above, x and y periodically broadcast (locally)
their distance to S. So x and y eventually learn that x
(say) is closer to S. From now on, y will not rebroadcast
packets from S.
- Requires an additional field, children, be added to
each routing table entry. Each bit indicates whether the
corresponding outgoing link from this router is a child
for mcast routing purposes.
- Refinement 3: truncated RPB
- Hosts at leaves periodically report their memberships in
various mcast groups. If no membership report is heard
from some leaf after some timeout, that leaf is pruned.
- Cost: an additional leaves bitmap per table entry.
Each bit in children bitmap indicates whether that
link leads to a
topological child (generally long-lived stable property);
corresponding bit in leaves bitmap indicates
whether that child has any currently active memberships
(short-lived property).
- Also bandwidth cost of membership reports, but they can be
on order of every few minutes.
- Refinement 4: reverse path multicasting (RPM) by on-demand
pruning
- After forwarding an mcast packet, routers may receive
non-membership report (NMR) from one or more children.
- If NMR's from all children are received before some
timeout, router reports NMR to its parent, pruning that
subtree. NMR's are positively acked; once the ack is
seen, any additional mcast packets received at the
NMR-sender will not trigger additional NMR's.
- NMR reports eventually expire, so pruned paths will rejoin
the mcast tree after the expiration time.
- Optimization: if new node joins on a particular link that
has already been pruned, routers can send an NMR
cancellation up the tree. Requires routers to remember which
NMR's they've sent and when.
- Physical topology changes must trigger NMR cancellations.
- Cost: as TRPB, plus cost of transmitting, storing and
processing NMR's.
- Link-state mcast routing:
- as in OSPF (like the Arpanet Routing
paper), each router independently computes global
shortest-path spanning
tree rooted at itself using Dijkstra's algorithm.
- Routers should consider mcast membership as part of the
link "state".
- Since there is a potentially a separate such tree from
every sender to every group, each router with L outgoing
links maintains a cache
of tuples <sender, dest-group, min-hops>,
where min-hops is an L-vector of distances to the
nearest descendant member of dest-group via that
outgoing link (with infinity meaning no group members live
along that link). Upon cache miss, the router computes
the vector on demand using Dijkstra's algorithm. (Like routers really have spare time to do
this.) Cache is invalidated when topology changes.
- Costs: hard to estimate due to the computing-on-demand
nature and variation in cache behavior, but some
observations:
most memberships likely to be long-lived;
volatile memberships likely to be local in scope or
sparsely distributed; non-membership reports ("I am
leaving the group") can be "batched" or piggybacked on
other updates; membership reports ("I am joining the
group") can be rate-limited at expense of greater join
latency.
Hierarchical mcast routing: if a host or superdomain router
attached to a subdomain sends mcast packet addressed to G into
the subdomain, packet is delivered to all hosts in G plus
all other superdomain routers attached to the subdomain.
- Optimization: "leaf subdomains" -- a single superdomain
router can be responsible for forwarding mcasts to rest of
network.
Relevance
Extends existing routing algorithms, by successive optimizations, to
handle multicast routing reasonably efficiently. Delay vs. cost
tradeoffs are well illustrated, as is separation of logical (mcast
group) from physical addressing.
Flaws
- No way routers have the resources to do the computations for
link-state extensions.
- Design goal: "Mcast packet should be delivered to each
member...with probability and delay very close to that of
unicast...this property gives higher-layer protocols a basis to
handle packet loss by retransmission". It's not obvious that
this is the right starting goal, but we have already seen this
"end-to-end vs. middleware" argument in the Cheriton/Skeen vs. ISIS
papers.
Back to index