Back to index
CS294-8 project discussion
- Using cheap stuff for reliable systems - Randi, Aaron.
- Cheap, commodity components are intrinsically more
unreliable. How do we deal with this in Millenium?
- "Intelligent components" - greater tolerance
to hardware faults. Use implicit info to detect
imminent component failure. Processor directly on
component. E.g. on disk, see more CRC errors.
- model failure modes of
and predict MTBF for nodes/entire system.
- commodity system
- commodity system with intelligent components
- system with redundant CPUs and intelligent
- David: could we explore these issues one level
higher up in the design space?
- LSI logic background: SMP on a chip?
- DRAM + logic (IRAM) helps
- Memory hierarchy - crossbar? single bus?
- at which level of the hierarchy do you do your sharing?
- David: what is specifically new or different about
multiple CPUs on one chip as opposed to multiple
- Bhaskar, Shankar
- microeconomic model - based on IOUs for a group of
- study dynamics of the model
- groups of clusters - granularity of trust is
- leader in cluster is negotiator, both inter-
and intra- cluster
- money within cluster, IOUs across clusters
- giving money for service is not atomic, must deal with
cheating and forging
- management issues, negotiations, arbitrations
- convergent honesty?
- Jimmy - build better computing environment for UCB
- more security, more reliability, more tech support,
- MIT Athena, CMU Andrew, are both better, even though
they are 80's technology for 80's applications
- study existing systems
- figure out what apps people want to use
- define requirements for UCB environment
- design architecture
- demonstrate pieces
- concoct roadmap for campus deployment
- plan for a plan...
- Matt - tiered resource mgmt.
- Millenium: lots o' apps
- application-level solutions/designs are the way to go
for issues like security, fault tolerance, etc.
- what programming model do you give people? how do you
get them to "code" at higher and higher
levels of abstractions? runtime/debugging environment?
- don't want full transparency..
- sounds like PVM
- One sentence summary: build, measure, or simulate
something within a month. NOW is good prototype for
- Jim: do stuff like build clusters within NOW with
different communications capabilities, etc.
- front end SPF problem:
- magic router
- client intelligence
- distributed decision at app level
- streams programming model - across nodes?
- infrastructure for doing simulations on clusters
- paper, graphs - have simulator, want to evaluate
simulator across large parameter space
- give system (simulator, parameter space) pair, and have
system handle this for you.
- decentralize fault tolerance into agent. give each
agent "possession" of some of the centralized
resources. Agent can use own resources for own jobs,
or broker them out to other agents. Analogy: each
agent has own apartment, can live their, or rent out.
- want machinery for tracking simulation results, which
piece of parameter space you've searched, which
simulations you've used, etc.
- might want to try out your simulation on new resources
(e.g. chem building computer) to see if it could work.
Obviously sandboxing needed. Agent provides requested
envelope of simulation, sandbox enforces.
- simulators with checkpoints - can then pick up and move
state to another machine if node going to fail.
Back to index