Back to index

CS294-8 project discussion

Using cheap stuff for reliable systems - Randi, Aaron.
- Cheap, commodity components are intrinsically more unreliable. How do we deal with this in Millenium?
- "Intelligent components" - greater tolerance to hardware faults. Use implicit info to detect imminent component failure. Processor directly on component. E.g. on disk, see more CRC errors.
- model failure modes of
  - commodity system
  - commodity system with intelligent components
  - system with redundant CPUs and intelligent components
  and predict MTBF for nodes/entire system.
- David: could we explore these issues one level higher up in the design space?
Randy
- LSI logic background: SMP on a chip?
- DRAM + logic (IRAM) helps
- Memory hierarchy - crossbar? single bus?
- at which level of the hierarchy do you do your sharing?
- David: what is specifically new or different about multiple CPUs on one chip as opposed to multiple chips.
Bhaskar, Shankar
- microeconomic model - based on IOUs for a group of clusters
- study dynamics of the model
  - groups of clusters - granularity of trust is a cluster
  - leader in cluster is negotiator, both inter- and intra- cluster
  - money within cluster, IOUs across clusters
- giving money for service is not atomic, must deal with cheating and forging
- management issues, negotiations, arbitrations
- convergent honesty?
Jimmy - build better computing environment for UCB
- more security, more reliability, more tech support, dialup-support
- MIT Athena, CMU Andrew, are both better, even though they are 80's technology for 80's applications
- roadmap:
  1. study existing systems
  2. figure out what apps people want to use
  3. define requirements for UCB environment
  4. design architecture
  5. demonstrate pieces
  6. concoct roadmap for campus deployment
- plan for a plan...
Matt - tiered resource mgmt.
- Millenium: lots o' apps
- application-level solutions/designs are the way to go for issues like security, fault tolerance, etc.
- what programming model do you give people? how do you get them to "code" at higher and higher levels of abstractions? runtime/debugging environment?
- don't want full transparency..
- sounds like PVM
Remzi
- One sentence summary: build, measure, or simulate something within a month. NOW is good prototype for millenium.
- Jim: do stuff like build clusters within NOW with different communications capabilities, etc.
Steve
- front end SPF problem:
  - magic router
  - client intelligence
  - distributed decision at app level
- streams programming model - across nodes?
David
- infrastructure for doing simulations on clusters
- paper, graphs - have simulator, want to evaluate simulator across large parameter space
- give system (simulator, parameter space) pair, and have system handle this for you.
- decentralize fault tolerance into agent. give each agent "possession" of some of the centralized resources. Agent can use own resources for own jobs, or broker them out to other agents. Analogy: each agent has own apartment, can live their, or rent out.
- want machinery for tracking simulation results, which piece of parameter space you've searched, which simulations you've used, etc.
- might want to try out your simulation on new resources (e.g. chem building computer) to see if it could work. Obviously sandboxing needed. Agent provides requested envelope of simulation, sandbox enforces.
- simulators with checkpoints - can then pick up and move state to another machine if node going to fail.

Back to index