Active Connection Management for Internet Services
Mike Chen, Stanford CS548 seminar, 5/22/02
- Problem: failover strategies and conn. mgt. in L4-L7 switches are
hardwired. A mismatch for the dynamism of large services (service
growth, machines get replaced, etc.) Soln: extend API of LB
switches to allow apps/infrastrcuture to dynamically control mapping of
client conns to phsyical machines . Hypothesis: improves
availability, load conditioning, admin/mgt.
- Goals include: dynamic rsc alloc to handle peaks; automate switch
config. to eliminate operator error; support graceful starting/stopping
services to support rolling reboot and online upgrades; help detect/recover
from svr failures.
Existing systems:
- Cisco Dynamic Feedback protocol: can set weighting vector for LB,
keepalive interval, server added/removed.
- IBM Director: relies on failover mech. of underlying sys. Eg to
reboot a node behind an LB switch, Director has to shut off the node
and rely on the switch to detect this first.
Existing L4 connection mgt primitves:
- Add: bind physical to virtual IP/port (after verifying health of resource)
- Remove: stop forwarding new conns (force remove: breaks existing conns
too)
- Not supported: Drop some new conns (according to some criterion)
for admission ctrl
Implementation of Mike's soln:
- Each app server generates events ("I want to bind to a virtual
address"), a separate Conn Mgr takes these and uses Java wrapper for
switch to implemenet these requests.
- CM is centralized with standby peers, maintains soft state (heartbeats
from servers), sends config deltas to switches
- Switch fails: CM adds known bindings to switch after recovery (switch
forgets its bindings on hard reboot!)
- Note - it takes as long for the standby to take over as it does to
restart the primary and repopulate its soft state! Mike admitted
standby is not as useful as he'd expected. A way to achieve seamless
transition would be to have 2 primaries and have the switch just ignore
messages from the standby, until it starts seeing messages from the standby
with no corresponding messages from the primary...but then you have to deal
w/the possibility that standby and primary will give different instructions,
and besides the switches are designed to accept messages only over a single
dedicated TCP connection for now.
- CM failure: Since all config of switch is via CM, if CM fails and then
another server fails before CM back up, switch will contain stale binding
for that failed server. So after CM comes back up, collects known-good
bindings from heartbeats, then removes unknown bindings on switches.
("The heartbeats are the only truth")
- Supports resource replacement through graceful removal: quiesce server,
then reboot it. ("100% availability" - except it's really
not, since there is temproary thruput reduction as standby warms up.
- App-level exceptions can actively remove failed resources (in addition to
middleware's existing health checks). This is one more way to detect
errors. (This implies the exception leaves the app in good enough
state that it can request to be removed; if it is in that good shape,
perhaps it should proactively restart itself, since that's what the switch
is going to do anyway?)
- Note rolling reboot vs. "big flip": Rolling Reboot only works if
new upgrade is software-compatible w/old upgrade, since the old and new
versions must coexist in same system.
- Neat hack: since current switches can't do admission control, add drop
servers, which are like /dev/null - it immediately RST's the TCP
connection, or holds the client connection open, but is a "bottomless
sink". Can use for preferential tretament, ie E-trade prioritizes
paying customers over free servers.'
- Interesting tidbit - COTS switches have a group of procs for data routing,
and a separate proc for SNMP/config. Add and remove take 20-25ms each
on Foundry ServerIron.