05/08/01 Armando's Notes from HDCC

Jim Gray, Internet reliability.

Questions from audience:

Observation from me:

Breakout sessions:

Bill Scherlis: projects for new NASA/CMU/etc testbed, eg open source dependability tools, etc

David Garlan: in charge of new SWE curriculum @ CMU.  Tell him to teach restartability-centric, fault-isolated, etc design in ugrad SWE classes.

Lynn Wheeler, CTO, First Data (the actual operators of Visa and MC's network)

Can we actually trade cycles (nearly free) for dependability?  Simple example: in one case with FDC, string buffer overflow cost billions of dollars in downtime, diagnosis, etc.  Original problem: programmers thought they could save a few machine instructions by making assumptions about string length, proper termination of strings, etc. rather than coding defensively.  To what extent could we integrate code like this into libraries, runtime, etc.?  Would this be a good contribution to the open-source project testbed (a "dependable" or "safer" libc that is more expensive)?

Steve Gonzalez, Chief, Ops Rsrch and Strategic Devel, NASA JSC

Bruce Maggs, Akamai/CMU

HDCC brainstorming

How to define a computational model?  One possibility -start relaxing Acid constraints one by one, since internet systems are query-like.  Another - instead of capturing something absolute, like conisistency, capture something like eventual consistency (consistency x latency), or basic availability (consistency x availability), or something like Aaron's avail benchmarking or Amin's conits, or "OK to say no"-ness.

Jim Gray: snapshotting filesystems - rotating RAID-0 on 2 out of 3 disks

Can machine virtualization be used to achieve the fault-isolation for restartability through modularity (FIRM)?

Plans for near term

Possible projects

"Resilient data structures" - internally redundant so they can be fixed/tolerate losses - would this be useful?

A thought for inexact answers: a utility function that measures answer "quality" vs. resource consumption (memory, time ...).  It may have a cliff, or the user may have a threshold, for answer quality vs. resource.