Implementation and Performance of Munin ======================================= Carter, Bennet and Zwanenepoel, SOSP 1991. Big Picture ----------- Improve the performance of DSM sytems by allowing multiple consistency protocols on a per-object basis. Overview -------- - No single DSM consistency protocol is suited for all programs. Moreover, within a single program, different shared variables are accessed in different ways. Furthermore, the pattern may change during run time. - Different "patterns" of shared variable access can be classified into a small number of broad categories (listed later). An earlier paper identifies these categories by examining the run-time behaviour of a bunch of benchmark programs. - Munin requires a programmer to annotate each shared object (or groups of objects) with a label corresponding to each category. This allows the DSM runtime system to employ per-object consistency protocols. - What do we gain? Reduction in #messages for maintaining consistency. - All but one protocol provides sequential consistency (defined by Ivy). The exception is a protocol for "release consistency". The paper does not formally define this term. It is borrowed from earlier work on the DASH multiprocessor. Release consistency is useful for objects protected by locks. The updates made to a protected object need not be made visible to others immediately when they occur. They can be batched together and made visible to other threads when the lock changes ownership. Implementing release consistency protocol requires Munin run-time to be aware of events that correspond to locks. As a consequence, Munin forces programmers to use its own library for locks and barriers. Details: What are the different consistency protocols? ------------------------------------------------------ Each protocol is parameterized by eight bits, corresponding to: - Invalidate/Update? - Replicas allowed? - Delayed operations allowed? - Fixed owner? - Multiple owners allowed? - Stable sharing patterns? - Flush changes to owner? - Writable? See paper for detailed description of each. Not all bit-combinations make sense. The authors have identified a few combinations do: they correspond to access patterns observed in shared variables in benchmarks. Details: What are the different patterns for accessing shared objects? ---------------------------------------------------------------------- Here are three examples. See paper for a complete list. - Migratory: A single thread performs multiple accesses (one or more writes) before handing it over to another thread. Typical of objects in critical sections. - Write-shared: An object is concurrently written by multiple threads without the writes being synchronized because the programmer knows that updates modify different parts of the object. e.g. False sharing of variables on a page. Munin requires the programmer to annotate each shared variable with a label: "Migratory" or "Write-shared" or ... Each such label corresponds to a bit-combination of protocol parameters listed above. Performance? ------------ - The authors chose two popular numerical programs. They compared two implementations: (a) hand-optimized message-passing version and (b) Munin version. The Munin version was worse by no more than roughly 10% in terms of speed. However, it required significantly less time to write. Discussion ---------- - What exactly is "release consistency"? What are some other consistency protocols used by DSM's or SMP/MPP's? - This paper reminds me of structured comments and vendor-specific "extensions" to SQL that guide a DB query plan generator to generate specific plans for better execution speed. The vendor-specific flags for "gcc" to win SPEC benchmark wars also come to mind.