Condor High Thruput Computing System

Myron Livny et al., Univ. of Wisconsin, Madison

One-line summary: Harvest idle cycles on other people's workstations to run batch jobs. Goals: portability; ease of programming; you control your own workstation. Provide file-based checkpointing (ergo migration), but current Unix trends are making their implementation more and more difficult.

Overview/Main Points

Philosophy: transparent checkpointing, location-independent behavior.
Facilities the OS should provide: checkpointing; detecting "when a user has left or returned to" their workstation (normalized loadavg, kbd/mouse idle, etc); OS supported checkpointing.
Scheduling and load balancing: controlled by centralized agent; priority with aging is the basic algorithm.
Portability: trap mmap() calls; can optionally redirect file I/O to RPC stubs. Files are reopened and lseek()'d to the right place on restart after chkpt.
Checkpointing is based on reconstructing info from core files generated on signal, so to the user process it looks like a signal handler.
Unix trends are making transparent checkpointing difficult and increasingly in conflict with portability. E.g. dynamic libraries have to be checkpointed; corefile formats not 100% standard; some OS's like Solaris don't really save all the CPU state in the corefile; etc.

Relevance

A different philosophy: mostly-support transparent checkpointing and location independence. A research question is: "What would you do differently if you were building Condor today"? Some design alternatives might be:

End-to-end argument for "application level" checkpointing - let the app checkpoint its own state.
Rather than complete location independence, affinities by constraint specification ("must see filesystem X", "OS must support select()", etc.), which can include "I have no location dependence".
Would something like Java give you checkpointing and portability, if you could build a checkpointing version of the JVM? (seems easier than arbitrary checkpointing in the OS).

Flaws

The manager seems to have hard state about what jobs run where, etc. What if it fails?
For things like "flock of condors", why not distributed/soft state/broadcast (ie BASE) management, like boundary routers, multicast routers, etc.,instead of a centralized manager?
Hard to do IPC cleanly or without breaking Condor mechanisms, so this is really for batch processing not distributed processing.
What about syscalls that return EINTR?
Multithreaded code (where threads are in the kernel)? Is the thread subsystem state kept in the corefile?
Any other languages supported besides C?

Back to index