5/20/01 Notes from HotOS
- JX: are language-based, type-safe OS's becoming reasonable? Do we
care? (Many OS fault models, such as resources not being reclaimed b/c
of zombies, unclaimed interrupts, etc are not any better addressed)
- Sys support for programming routers: Statically proving safety of
functions is hard (deps on code, inputs, dynamic env), so need abstractions
that don't limit expressiveness of extension code despite this. Soln:
expose protection HW of router at lowest level: fixed invocation overhead
(switching uproc protection domain), native execution (vs VM), hard
protection guarantees [had a pointer to last SOSP?]. Performance: hard
to predict CPU needs of router's core tasks, so use priorities (not
proportional sharing) as primitive, and have extensions adapt to transient
uanvaialbility of CPU. Event-driven control flo;w: localize state in
fns, carry invocations to fns, so fns can share state across flows
(interflow priority scheduling, etc). How do you carry the state
around? Fn global variables? Do fns have to be reentrant?
Fine-grained sched is only possible if funcitons voluntarily yield...no?
- Lazy thread switching (Jochen Liedtke) has an interesting use of soft
state as an optimization. idea: for multiple kernel threads inside
same user addr space, keep a twin of TCB info in user space. When t1
IPC's to t2, just switch from t1 to t2 by modifying user-space copy of TCB
(so you save a kernel crossing); when a "real" kernel activiation
occurs (exception, timeslice expires, etc), the kernel can notice that the
UTCB and KTCB are incosistent and force the UTCB to match KTCB.
Threads can destroy their own tasks/hose themselves by stomping on UTCB, but
KTCB is still "the truth". Result, you save 2 kernel
crossings per IPC (you can have several "user-space-only" thread
switches before any kernel event forces a "real" kernel thread
switch). A nice use of soft state! Gives you most(?) of the fault and performance isolation of
kernel threads without the overhead of kernel crossings. (Their
tagline: "Might reunify kernel threads and user threads") Then
they tried on new P-IV (which has a non-P6 core!) and it took a lot
longer - and inserting NOP's actually made it faster! (probably RS
skid, or RA stall, or screwing up Load Value Prediction, or something like
that...a good example of uarch working against the OS!)
- Fail-stutter FT (Remzi and Andrea): things don't "just fail" or
failstop: fault masking and slow death and geometry-based performance deg
(disks); fault masking (ECC, etc) in processors, non-determinism;
deadlock, unfairness, congestion (networks). Remzi argues: even
hardware does fail and fault-mask, so shouldn't trust your life to
it. OK, but can we apply redundancy and isolation/randomness (to
enforce indep failure assumption)? Fail-stutter FT attempts to
capture this "intermediate" between Byzantine and fail-stop.
Esp. a problem in parallel performance assumptions when one component is
fail-stuttering (but therefore looks "normal"). Toward a
model: try to capture performance fault (compared to performance
spec). (Gribble also suggested introspecting based on "known
steady state" behavior. See Richardson et al. Internet performance
failure detection for ideas...and add to reading list)
- What's been missing from Remzi and Steve's papers: what precisely is the
model? what precisely are the design guidelines for building a system?
- Jay Lepreau: these techniques (discussed in FT sessions) really only work
when systems are loosely coupled and state-coupling-based dependencies are
eliminated. Also, there's a cost to retrofitting existing systems (or
creating new ones) that use these nice interfcace boundaries. To
what extent can we use machine virtualization to fix existing systems?
- Marvin: Byzantine FT is bunk, because assumes 3f+1 good nodes; in
practice, successful attacks take down large numbers of nodes at once, so
it's hard to argue that Byzantine assumption holds. (Unless you make
your Byzantine group enormous, but the BFT has n^2 growth)
P2P session
- PAST: p2p storage system with Tapestry-like routing - in fact I'm not sure
how it is different from Plaxton mesh.
- Chord (distributed lookup for p2p): maps DocID to nodeID w/consistent
hashing. Nodes and docs share ID space; docID N stored on
nearest-successor-numbered node (if node N doesn't exist). Each node
knows of logN other nodes, corresponding to those nodes whose ID's are 1/2,
1/4, ..., 1/2^N away in the node ID space
- Greg Ganger: for better security, make subsytems have their own security
perimeters (disk, NIC, display, etc). Challenges: how to do
delegation, what should each device do behind security perimeter, etc.
Strongly reminiscent of "orthogonal security" as espoused by
Goldberg, Wagner & Brewer (but was never published).
- Rob Ricci and Jay Lepreau: to make p2p networks more censor-resistant, use
"protocol objects" to deploy and replace transport protocols
hop-by-hop. PO identifies by a hash which protocol it will use.
Virtualization (Brian Noble, Peter Chen):
- Idea: better to run some services/apps on VM on top of host OS. Can
introspect what the app/guest OS is doing (eg writes to PTBR indicate addr
space changes on VM), apply well known FT techniques (log all VM activity,
then replay log to replay a complete execution). Examples: secure
logging. Current systems vulnerable to hackers that turn logging off;
do logging of VM activities to do intrusion analysis. To figure out
what to log, use methods from Thy & Pracice of Failure Tol. (log
nondeterministic events + network messages, etc)
One.world (Robert Grimm)
- Exterminate complex/unsuitable abstractions: distrib objects (hard to
evolve, formats set by standards bodies, difficult to control from security
standpoint since all object accesses involve remote code execution);
transparent distribution (eg RPC, because they embody single-node-model
assumptions that failure is uncommon case). We have similar
motivations and have come up with somewhat different appraoches; should
compare point by point.
- Model they adopt: tuples and event handlers. Environments serve as
containers for those (and can contain other environments).
Environments can be migrated; the idea is that the enviornment doesn't have
residual dependencies/pointers outside itself that are implicit.
Migration == isolate migration logic in separate environment, then embed
application's environment in that environment.
Case for Resilient Overlay Networks (Dave Andersen, Hari B.)
- Make small resilient overlays on the real Internet, to synthesize a
meta-Internet with a small number of nodes. This lets you do things
that would be hard if network scalability was a real issue. The real
Internet isn't many-to-many anyway; it has choke points, and [Labovitz 00]
says Internet routing convergence is an order of agnitude slower than
previously thought - 3 min. recovery time, 15 min max, for simple
failures. Claim: sclabaility/heterogeneity fundamentally lead to slow
recovery. [No proof offered of this claim] So instead,make
small/homogeneous groups, and do fast recovery in those groups.
- Resilient overlay networks (RON) can be used to make more sophisticated
routing decisions, multiple route tables, packet inspection for
content-policy-based routing, etc.
- "Conduits" emulate sendto() and recvfrom() so existing apps will
just work!