---------------------------------------------------------------------------
Harty & Cheriton, "Application-Controlled Physical Memory Using
External Page-Cache Management", ASPLOS V, 1992
---------------------------------------------------------------------------

==========
Why care ?
==========

Status quo consists of 2 imbalances (written in '92, but still true): 

1. memory systems are key bottlenecks for many applications (in spite
   of increasing memory sizes, growth in secondary and
   network-accessible storage maintains secondary/primary ratio).

2. CPUs much faster than I/O

Three major issues with current VMs (still true!)

1. Apps have little info about physical memory availability. This
   could be very useful for garbage collection, parallel queries, etc.

2. Apps cannot control the contents of their physical pages. Such
   control would enable them to avoid page faulting during critical
   operations (e.g., while holding a lock in a DBMS). Caches, mpin(),
   mmap(), madvise() are powerful, but represent cumbersome guesswork.

3. Apps cannot control read-ahead, writeback, and discarding of memory
   pages. This would allow progs to minimize the effect of I/O on
   their execution (e.g., could swap in pages needed in the future,
   while doing CPU-intensive processing).

Good anti "single-level store" argument: boundaries are transparent
except for performance. So V++ provides an external page frame cache,
managed at user level by apps. For apps that don't want to bother with
this, there is a default process-level manager.

Another good reason: small kernel, modularity

============
Architecture
============

VA space = 1 "VA space segment", made up of regions bound to a
(virtual) memory segment, which consists of several (virtual) pages,
that map to physical pages. Each (virtual) memory segment is
associated with a segment manager. Segment managers can MigratePages()
between segments, ModifyPageFlags(), and GetPageAttributes().

General path: A page fault suspend the app and causes a trap to the
kernel, which forwards the fault to the segment mgr. The mgr allocates
a page (from its "free-page segment"), reads data from disk, writes it
into the new page, and asks the kernel to place the page at the right
VA in the app's address space. The app is then resumed. Filling page
frames dominates the cost of a page fault.

At init time, all page frames are made part of a segment with a
well-known ID, accessible only to the system page cache manager, which
can then MigratePages() to user-level segment managers on demand.  A
segment manager can either be part of the app (fewer context switches)
or in a separate process. To avoid recursive faults in the former
case, V++ uses a separate, pinned stack when handling page faults.

Page frame reclamation strategies (e.g., "clock" alg, "soft state" =
discard & regenerate)

Seg mgr can be specialized via OO inheritance from a default seg mgr.

To page a seg mgr, either have its code and data be managed by another
seg mgr (unpredictable performance) or by itself. In the latter case,
when the seg mgr starts up, it will page fault and cause pages to
become resident, then takes ownership of the segments, and reverifies
residence (may repeat several times). When paging out the entire app,
seg mgr swaps out everything but its own code and data, then
relinquishes ownership and allows itself to be swapped out. When
resumed, it goes through the same init sequence again.

How should the system page cache manager (SPCM) allocate pages to the
other managers? Free market economy, reminiscent of leaky bucket
regulator for network QoS. Each process makes I drams/sec (I
determined by global conditions and admin policies) and is charged D
drams for holding 1 MB of memory for 1 second. Bankruptcy causes
reclamation of memory. To save, apps can page memory out to disk, but
they can't save excessive amounts of drams and they can't do excessive
I/O either. Cool, because: (1) can control resource consumption, and
(2) enable apps to understand how resources are allocated.

Open issue: not clear that this is useful when memory resources are
considerably smaller than the working set size of the apps.

"An application can only trade space for tim if the space is real, not
virtual."