Working Sets: Past and Present -- Peter J Denning
-------------------------------------------------

Saltzer's Problem: Ultimate objective of a multiprogrammed memory
manager is to be an adaptive control that would allocate memory and
schedule the CPU to maximize performance.

In 1967 the coupling between memory and CPU scheduling was not
considered essential and Denning's working-set based memory manager
was considered "risky". Since programs during that time had little
locality, studies seemed to show that there was no clear winner among
page replacement algorithms.

A program's working set is the collection of the recently references
pages of a program's virtual address space. It provides an intrinsic
measurement of the program's memory demand.

By 1976, it was shown through research on program behaviour of various
memory policies and queuing theory, that working set principle is a
cost-effective basis for near-optimal management of multiprogrammed
memory.

Reference string: sequence of T references, r(1)...r(T), where r(i) is
the segment containing the ith virtual address generated by the
program.

Resident set: Set of all the pages of a program present in main memory

Cold start: empty initial resident set; warm start: initially
non-empty resident set due to preloading of some segments

Memory policy's control parameter theta is used to trade paging load
against resident set size. Larger theta means larger resident set with
longer mean interfault times. Multiple control parameters do not seem
to have any advantage over a single parameter.

Swapping curve: A function f relating the rate of segment faults to
the size of the resident set. fixed-space memory policy interprets
theta as the size of resident set. variable-space memory policy
interprets theta as a bound on the residence times of segments.

Lifetime curve: g(x) = 1/f(x) where x=mean resident set size. This
gives the mean number of references between segment faults (i.e,
paging rate). knee and primary knee.

Resident set size at virtual time t is denoted by R(t, theta)

Inclusion property: R(t, theta) is subset of R(t, theta+a) for a>0

Empirical models of g(x): Belady model, Chamberlain model; both of
them do not correspond exactly to real data. A lifetime curve model
should have knees which are close to the real data, since optimal
performance is associated with knees.