Fast Mutual Exclusion for Uniprocessors

Brian Bershad, David D. Redell, John R. Ellis

One-line summary: A Restartable Atomic Sequence, when preempted/interrupted, is restarted from the beginning when resumed, so that when it finally completes it will have executed atomically. Provides lightweight optimistic concurrency control for uniprocessors lacking support for atomic operations.

Overview/Main Points

Optimistic mechanism, since in the common case of success (RAS is not preempted), it has very low overhead. Attractive for uniprocessors that don't support atomic operations. (In 1992, most uniprocs that did support them did so very inefficiently.)
Other methods for supporting mutual exclusion on uniprocessors: Lamport's fast mutual exclusion algorithm (provably fastest using 2 loads and 5 stores, but requires storage O(numthreads * nummutexes)); Lamport bundled with test-and-set (O(1) storage, but adds 1 load and 2 stores to critical path); and meta-test-and-set (which synchronizes access to all synchronization objects through a single mutex, not very efficient).
When preemption occurs, need to detect whether it occurred inside a RAS.
Method 1 (Taos): Designated Sequences containing "landmark" NOP's at specific points in sequence. (Compiler ensures that this "fingerprint" doesn't occur in other code.) Reasonable for Taos since the kernel and apps use the same threading library, the same single compiler, etc.
Method 2: (Mach): start address (within app's address space) and length of each RAS is registered at app start time.
Using in the kernel: potential for deadlock if the PC check is not properly ordered with respect to other events. Example: user page fault traps to kernel; preemption occurs while in kernel; when resume occurs, PC check may cause another page fault, etc.
Performance: thread suspensions occur less frequently than atomic operations in threaded programs, so the extra work during thread switch (PC check) is justified.
RAS's perform several times better than kernel emulation (turning off interrupts to execute atomic user code) or software reservation using Lamport's algorithm. Even better performance if sequences can be inlined.
Most overhead for RAS occurs outside critical section, so short critical sections remain short.

Relevance

A way to have mutual exclusion without hardware support, or where hardware support is inefficient, by exploiting the common case. Another little gem of OS structure.

Flaws

How to apply to other OS's, since in each case described an OS-specific technique was used? To what extent can compiler support be enlisted even for a "generic" OS? (e.g. use something like designated sequences but place a long "magic constant" landmark within the sequence that is overwhelmingly unlikely to appear elsewhere.)

Back to index