**One-line summary:**
Per-VC credit-based flow control in which credit and bandwidth
allocations adapt dynamically to offered load. The use of
adaptive flow control for statistical multiplexing of traffic through a
switch keeps switch buffer requirements modest (RTT x O(1), as
opposed to RTT x peak BW) and cell loss low.

- Why to use credit based flow control per VC (virtual ckt):
- "fill in" traffic to use surplus capacity after guaranteed traffic has been scheduled
- Data to be used for "fill in" must be "drawn" to switches with excess capacity early enough to fill in the excess, but don't want to draw excess traffic (resulting in droppage)
- congestion control via backpressure per VC doesn't affect throughput of noncongested VC's

- The "N23" credit-based flow control scheme:
- Roughly, N2 is amount of "steady state" buffering available per VC at receiving switch; N3 is "overflow" buffering for excess traffic. N2 generally stays fixed, N3 may be varied over life of VC. The larger N2 is, the more resilient the VC is to short-term load fluctuations.
- N3 chosen as B
_{vc}x RTT, where B_{vc}is targeted average bandwidth of this VC. - When sender receives "credit update" C from receiver, new credit is computed as C-E, where E is number of cells sent in last RTT (i.e. number of cells in flight when update received)

- Properties of N23 scheme:
- No data overflow, as long as corrupted credit updates can be detected
- Self-repairing as long as corrupted credit updates are eventually retransmitted
- Average BW achievable over one RTT is at most (N2+N3)/(RTT+N2)

- Credit update protocol (CUP): a simple hardware implementation of
N23.
- Receiver sends credit update; sender computes new credit as N2+N3-(Vs-Vr), where Vs is number of cells sender has forwarded or dropped, and Vr is number of cells receiver has forwarded (enclosed in update).
- Note Vs+Vr=B+E, where B is the number of cells in the receiver's buffer when update leaves receiver, and E is as above.
- Each receiver also keeps running total U of all data cells it has received. "Periodically" (an engineering choice) each sender transmits its current Vs, and the receiver computes the number of lost cells as Vs-U, then increments U and Vr by this value.

- Adaptive credit allocation:
- Size of shared buffer pool is r x RTT, where r>1.
- Each VC wants to get C=B x RTT, where C is its credit and B its operating bandwidth over some measurement interval M (paper assumes M=RTT for simplicity).
- Each VC's credit allocation is always strictly larger than its operating credit by a factor of r'>=r>1 (to allow "headroom" for fast credit growth).
- Suppose the shared buffers are not completely full when credit reallocation occurs. VC's experiencing ramp-up in bandwidth can grow their share of allocation by a factor of up to r per reallocation, i.e. exponential ramp up (like TCP slow start).
- Suppose shared buffers
*are*completely full when reallocation occurs: so we should consider only the shared buffer size minus the in-flight cells. - Parameter r can be reduced dynamically as each VC's operating credit approaches its target bandwidth. There is also a parameter a that defines the cutoff of a low-pass filter, which insulates N3 from rapid fluctuations in bandwidth. The choice of both of these is "an engineering choice" -- no details are given.
- Adaptation can be sender- or receiver-based. Receiver based allows quick adjustment of N3: since N3 is a receiver-only parameter it can just be changed locally as desired. But receiver-oriented implies that ramp-up will be delayed by about 1 RTT.

- Per-VC flow control enables statistical multiplexing of the shared buffer, helping to minimize its size. Under low to moderate congestion, very few cells lost; under heavy congestion, cells get lost.
- Simulations:
- Assumption: all N VC's have identical loads, involving bursts of B cells with exponentially distributed inter-burst times.
- Assumption: shared buffer pool size is M=b x N x B cells, where b<1 is the "memory reduction factor" due to statistical multiplexing.
- Simulation 1: link utilization, number of cell delays, and number of dropped cells as function of N3, with B=172, N=100, M=4096, offered load=0.95, RTT=3200 cell times. Result: link utilization quickly reaches 1.0 and stays there; at about N3=100, number of dropped cells starts to climb, since statistical multiplexing starts to break down.
- Simulation 2: same, but with unlimited memory size, also plotting max memory usage as function of N3. Result: same, but memory usage grows slowly up to about N3=80, then slope takes off, again as statistical multiplexing begins to fail.
- Simulation 3: goal is to show that the same throughput and rate of cell loss can be achieved with less memory (i.e. lower b) if per-VC flow control is used than if it is not. Duh. Result is that adaptive FC can pin b to about 3.
- Simulation 4: shows fast adaptation. N3 value at receiving switch tracks N3 at sending switch with a delay of about 4000-5000 cell times, i.e. about 1.5 RTT, as expected.

- The parameters r and a, and the manner and frequency with which they can be varied over time, seem critical to the stability and responsiveness of the algorithm, yet all the authors say is that they are "engineering choices". Bah.
- "Burstiness at all timescales" calls simulated load assumptions into question.
- Are the counters required by their implementation soft or hard state? If soft state, how is the credit computed at time t=0, and what effect on the VC (and other VC's) is seen if the state is lost during the connection?