Back to index
Fast Mutual Exclusion for Uniprocessors
Brian Bershad, David D. Redell, John R. Ellis
A Restartable Atomic Sequence, when preempted/interrupted, is restarted
from the beginning when resumed, so that when it finally completes it
will have executed atomically. Provides lightweight optimistic
concurrency control for uniprocessors lacking support for atomic operations.
- Optimistic mechanism, since in the common case of success (RAS is
not preempted), it has very low overhead. Attractive for
uniprocessors that don't support atomic operations. (In 1992,
most uniprocs that did support them did so very inefficiently.)
- Other methods for supporting mutual exclusion on uniprocessors:
Lamport's fast mutual exclusion algorithm (provably fastest using
2 loads and 5 stores, but
requires storage O(numthreads * nummutexes)); Lamport
bundled with test-and-set (O(1) storage, but adds 1 load and 2
stores to critical path); and meta-test-and-set (which
synchronizes access to all synchronization objects through
a single mutex, not very efficient).
- When preemption occurs, need to detect whether it occurred inside
Method 1 (Taos): Designated Sequences containing "landmark"
NOP's at specific points in sequence. (Compiler ensures that
this "fingerprint" doesn't occur in other code.)
Reasonable for Taos since the kernel and apps use the same
threading library, the same single compiler, etc.
Method 2: (Mach): start address (within app's address space) and length
of each RAS is registered at app start time.
- Using in the kernel: potential for deadlock if the PC check is
not properly ordered with respect to other events. Example: user
page fault traps to kernel; preemption occurs while in kernel;
when resume occurs, PC check may cause another page fault, etc.
- Performance: thread suspensions occur less frequently than atomic
operations in threaded programs, so the extra work during thread
switch (PC check) is justified.
- RAS's perform several times better than kernel emulation (turning
off interrupts to execute atomic user code) or software
reservation using Lamport's algorithm. Even better performance
if sequences can be inlined.
- Most overhead for RAS occurs outside critical section, so
short critical sections remain short.
A way to have mutual exclusion without hardware support, or where
hardware support is inefficient, by exploiting the common case. Another
little gem of OS structure.
How to apply to other OS's, since in each case described an
OS-specific technique was used? To what extent can compiler support be
enlisted even for a "generic" OS? (e.g. use something like designated
sequences but place a long "magic constant" landmark within the sequence
that is overwhelmingly unlikely to appear elsewhere.)
Back to index