Back to index
Information and control in gray-box systems
Andrea Arpaci-Dusseau and Remzi Arpaci Dusseau, Univ. of Wisconsin -
Madison
Summary by AF
One-line summary: To improve performance of OS
services under an application (file caching, memory mgt/paging, etc), use indirect
methods to monitor what the subsystem is doing, apply information about the
known workings of the subsystem to infer what's going on, then use direct or
indirect control methods to induce the desired behavior instead.
Overview/Main Points
- A "gray box" system is one whose internal workings are not known
in detail but about which some information is available or can be
obtained. Example: TCP congestion control makes assumptions about what
is happening at the other end and/or in the network simply by monitoring
dropped packets; various approaches to TCP measurement (e.g. the famous
paper by J.-C. Bolot et al.) infer where network bottlenecks are by looking
at delay differences between outgoing pairs and incoming pairs of
packets. Another example: Microsoft MS Manners infers when a low-pri
background process should be suspended, by assuming that if it competes with
other processes that are active, the contending processes will experience
roughly symmetric degradation of performance.
- How can information about the gray box be obtained?
- Algorithmic knowledge: eg you "know" the file cache does LRU
replacement (and you may know the cache size).
- Monitor observable outputs. eg TCP congestion control.
- Use statistical methods - observe correlations between desired
behaviors and measured outputs
- Use microbenchmarks to parameterize/normalize observations
- Do active probes if the "normal" request rate is
insufficient
- How can system be controlled once information is converted to a plan?
- Move system to a known state, eg periodically flush contents of page
cache so that you can ascertain its contents
- Reinforce behavior via feedback, eg if cache is LRU, purposely keep
accessing elements that you want to avoid having evicted
- Authors implemented some applications of graybox behavior...controlling
file cache, monitoring available heap to do admission control on new
processes (by returning null() from malloc or blocking if allocating the
memory would result in activating paging), and file layout on disk
(command-line util to reorder file accesses in a glob pattern to match the
known layout of the selected files on disk).
- Toward a gray toolbox: various techniques commonly needed when building
ICL's (information and control layers) for gray box systems, incl.
microbenchmarking tools, measuring output, and interpreting measurements to
figure out what the state of the ICL is and what to do to move it toward the
desired state.
Relevance
Although this is written from the standpoint of optimizing whole-system
performance in the absence of OS source code or detailed knowledge of OS
services, it seems readily applicable to external/orthogonal monitoring for
robustness. All the above concepts transfer directly.
Back to index