Tornado Codes and Digital Fountain

John Byers, Michael Luby, Michael Mitzenmacher, Ashutosh Rege (DEC SRC)
A Digital Fountain Approach to Reliable Distribution of Bulk Data

(folder #2)

One-line summary: Combine Tornado codes (computationally inexpensive and relatively space-efficient FEC code) with layered multicast, to enable "digital fountain" from which you can start drinking at any time (adapting your drinking rate to your bandwidth, as with other layered mcast apps) and stop after receiving any distinct k out of (k+l) packets to reconstruct the original data. Packets are scheduled across layers to minimize # of dup pkts you receive before getting k distinct ones.

Summary/main points

Tornado code

a FEC code (erasure code) that works by constructing successive sets of bipartite graphs, where the RHS of each graph consists of the selective XOR of certain elements on the LHS of that graph. (The LHS of the leftmost bipartite graph is the source data.)
Stretch factor c=n/k measures number of total packets n needed to redundantly encode k source packets. This paper chooses l=k redundancy pkts, hence n=k+l=2k.
Encoding/decoding time is O((l+k)P) for a P-sized packet. Compare to traditional erasure codes (such as Reed-Solomon) which are typically O(lkP) and rely on more complex operations than XOR; in those codes, only practical for small k and l, whereas Tornado codes are fine with k,l on the order of 10⁴.
One approach is interleaving combined with traditional codes, with interleaving factor K. Problem: want small K to keep decoding fast, but smaller K means receiver may have to receive a lot more packets to reconstruct data, since contribution of each arriving packet to an interleaving "bin" has a nonuniform distrbution. Also, in general, #packets required to reconstruct interleaved data grows superlinearly with orig data size; for Tornado codes, grows only linearly.
Reception overhead is e if (1+e)k packets must be received to reconstruct orig data. For n=2k, e=1.

Exploiting layered multicast

Goal: using mcast, arrange for it to be the case that a receiver can receive any distinct k out of n packets and reconstruct the source data. Then can just "carousel" the encoded packets forever; receivers can tune in whenever they want, and drop out after they receive k (distinct) packets.
Idea: use mcast layers to distribute encoded packets. Each layer has twice the bandwidth of lower layer; subscription is cumulative. (Exception: layers 0 and 1 assume same bandwidth.) Problem: how to schedule packets across layers?
"One-level property": If receiver stays at same layer throughout, and packet loss rate < (c-1-e)/c, then receiver can reconstruct source data before receiving any duplicate packets. Authors show a simple scheme based on binary numbering that satisfies the OLP. Satisfying the OLP is good because it lets you keep your stretch factor c pretty small.
Binary-numbering scheme also explains how to schedule packets across layers to preserve OLP.
Receivers can only subscribe to higher layer after seeing asynchronization point (SP) in their own layer. SP freq is inversely proportional to layer BW, so lower-layer receivers get more frequent opportunities to move up.
Instead of explicit joins, sender "induces" congestion by periodically sending a burst at double the normal rate on each layer (this simulates what clients would experience following an explicit join). If a receiver does not experience congestion during the burst, it can safely subscribe to the next higher layer at the next SP.

Flaws

When should server insert bursts and SP's? Didn't seem to be addressed much in their simulations (which mostly kept the receiver at constant level throughout), nor do they propose heuristics for it (unless I missed something).
What's the effect on "good Internet citizenship" of inserting such bursts? Is it at least no worse than explicit joins in practice?

Back to index

John Byers, Michael Luby, Michael Mitzenmacher, Ashutosh Rege (DEC SRC) A Digital Fountain Approach to Reliable Distribution of Bulk Data

Summary/main points

Flaws

John Byers, Michael Luby, Michael Mitzenmacher, Ashutosh Rege (DEC SRC)
A Digital Fountain Approach to Reliable Distribution of Bulk Data