--------------------------------------------------------------------------- Harty & Cheriton, "Application-Controlled Physical Memory Using External Page-Cache Management", ASPLOS V, 1992 --------------------------------------------------------------------------- ========== Why care ? ========== Status quo consists of 2 imbalances (written in '92, but still true): 1. memory systems are key bottlenecks for many applications (in spite of increasing memory sizes, growth in secondary and network-accessible storage maintains secondary/primary ratio). 2. CPUs much faster than I/O Three major issues with current VMs (still true!) 1. Apps have little info about physical memory availability. This could be very useful for garbage collection, parallel queries, etc. 2. Apps cannot control the contents of their physical pages. Such control would enable them to avoid page faulting during critical operations (e.g., while holding a lock in a DBMS). Caches, mpin(), mmap(), madvise() are powerful, but represent cumbersome guesswork. 3. Apps cannot control read-ahead, writeback, and discarding of memory pages. This would allow progs to minimize the effect of I/O on their execution (e.g., could swap in pages needed in the future, while doing CPU-intensive processing). Good anti "single-level store" argument: boundaries are transparent except for performance. So V++ provides an external page frame cache, managed at user level by apps. For apps that don't want to bother with this, there is a default process-level manager. Another good reason: small kernel, modularity ============ Architecture ============ VA space = 1 "VA space segment", made up of regions bound to a (virtual) memory segment, which consists of several (virtual) pages, that map to physical pages. Each (virtual) memory segment is associated with a segment manager. Segment managers can MigratePages() between segments, ModifyPageFlags(), and GetPageAttributes(). General path: A page fault suspend the app and causes a trap to the kernel, which forwards the fault to the segment mgr. The mgr allocates a page (from its "free-page segment"), reads data from disk, writes it into the new page, and asks the kernel to place the page at the right VA in the app's address space. The app is then resumed. Filling page frames dominates the cost of a page fault. At init time, all page frames are made part of a segment with a well-known ID, accessible only to the system page cache manager, which can then MigratePages() to user-level segment managers on demand. A segment manager can either be part of the app (fewer context switches) or in a separate process. To avoid recursive faults in the former case, V++ uses a separate, pinned stack when handling page faults. Page frame reclamation strategies (e.g., "clock" alg, "soft state" = discard & regenerate) Seg mgr can be specialized via OO inheritance from a default seg mgr. To page a seg mgr, either have its code and data be managed by another seg mgr (unpredictable performance) or by itself. In the latter case, when the seg mgr starts up, it will page fault and cause pages to become resident, then takes ownership of the segments, and reverifies residence (may repeat several times). When paging out the entire app, seg mgr swaps out everything but its own code and data, then relinquishes ownership and allows itself to be swapped out. When resumed, it goes through the same init sequence again. How should the system page cache manager (SPCM) allocate pages to the other managers? Free market economy, reminiscent of leaky bucket regulator for network QoS. Each process makes I drams/sec (I determined by global conditions and admin policies) and is charged D drams for holding 1 MB of memory for 1 second. Bankruptcy causes reclamation of memory. To save, apps can page memory out to disk, but they can't save excessive amounts of drams and they can't do excessive I/O either. Cool, because: (1) can control resource consumption, and (2) enable apps to understand how resources are allocated. Open issue: not clear that this is useful when memory resources are considerably smaller than the working set size of the apps. "An application can only trade space for tim if the space is real, not virtual."