* A New Page Table for 64-bit Address Spaces
* Authors: M. Talluri, M.D. Hill, Y.A. Khalidi
* Field: OS/Virtual Memory
* Reference: 15th SOSP, Vol. 29, No. 5

Summary of traditional page tables:

Basic function: Given virtual page number (VPN), get the page table
entry (PTE) which contains the physical page number (PPN), valid bit,
and attributes.

1. Linear: VPN == index into array of PTEs.  The page table is itself
in virtual memory, and a separate data structure keeps the mapping for
the table.

2. Forward-mapped: Tree of tables, bits of VPN index each level.  Each
level contains the page table pointers to the next level.

3. Hashed: VPN churned through hash function, chained hash buckets
contain VPN tag and next pointer as well as PTE.

Problems for 64-bit: Forward-mapped requires lookup in 6-level tree on
every TLB miss.  Linear poor for sparse address spaces: at least one
page allocated for every sparse memory address touched.  Hashed tables
have high overhead (next pointer + virtual page tag = 128 bits, PTE =
64 bits).

Subtle: Why doesn't linear have the expensive lookup of forward-mapped
tables?

Primary contribution: clustered page table.  Similar to hashed except
each bucket contains PTEs of many contiguous pages.  Essentially
taking low bits of the VPN out of the hash function, then using them
to index into the retrieved bucket, which is now a small linear page
table.  The size of the linear table is called the "subblock factor"
(I'll use the symbol F).

Specifically, we now use hash lookup for page blocks of size (F *
pagesize).  Each page block is paged with a linear table in the hash
bucket.


Second contribution: integrating TLB support for superpages and
(partial) subblocking into all types of page tables.  Definitions:

TLB = translation lookaside buffer, associative cache for fast lookup
of PTE given VPN.

Superpage = pages with a power-of-two multiple of base page size.
Superpages are contiguous (within a superpage, consecutive virtual
address = consecutive physical address) and aligned.  TLB entry has a
single VPN tag, single valid bit, single PPN for the whole superpage.

Complete subblocking = Single VPN tag, multiple valid bits and PPNs
(one per page).  The pages of the block don't need to be contiguous
physical pages (PPN kept for each block) and don't need to be in
memory all at once (multiple valid bits).

Partial subblocking = Single VPN tag, single PPN, multiple valid bits.
Since there is only one PPN, the pages of the block must be contiguous
physical pages.  They do not need to be in memory all at once.

Support for TLB extensions in regular page tables: Replicate PTEs
(feasible), Multiple page tables (less feasible).

Support for TLB extensions in clustered tables: Buckets for partial
subblocking and superpages only contain the needed information, as
described above.

Lookup for partial subblocking: PTE has new "S" (subblock)
bit.  If it's on, and the VPN tag matches, the bucket offset is
ignored and the single PPN is returned (Q: where does the bucket
offset's bits get used?).

Lookup for superpages <= page block size: Always lookup as if the
superpage size is == page block size.  Proceed same as for partial
subblocking, except must continue down hash chain even if invalid
entries found.

Lookup for superpages > page block size: replicate clustered PTE or
multiple page tables.

Interesting points:

1. The boundary cases of the clustered hash table reduce to hashed and
linear tables.  With subblocking factor 1, it is exactly a hashed page
table -- 1 PTE/bucket and no bits taken away from the hash function.
With subblocking factor=2^64, it is a linear table -- only one giant
bucket with a giant table, and all the VPN bits are used to index into
it.  

2. Choosing the subblock factor.  Interaction with caching, partial
subblocking.