* A New Page Table for 64-bit Address Spaces * Authors: M. Talluri, M.D. Hill, Y.A. Khalidi * Field: OS/Virtual Memory * Reference: 15th SOSP, Vol. 29, No. 5 Summary of traditional page tables: Basic function: Given virtual page number (VPN), get the page table entry (PTE) which contains the physical page number (PPN), valid bit, and attributes. 1. Linear: VPN == index into array of PTEs. The page table is itself in virtual memory, and a separate data structure keeps the mapping for the table. 2. Forward-mapped: Tree of tables, bits of VPN index each level. Each level contains the page table pointers to the next level. 3. Hashed: VPN churned through hash function, chained hash buckets contain VPN tag and next pointer as well as PTE. Problems for 64-bit: Forward-mapped requires lookup in 6-level tree on every TLB miss. Linear poor for sparse address spaces: at least one page allocated for every sparse memory address touched. Hashed tables have high overhead (next pointer + virtual page tag = 128 bits, PTE = 64 bits). Subtle: Why doesn't linear have the expensive lookup of forward-mapped tables? Primary contribution: clustered page table. Similar to hashed except each bucket contains PTEs of many contiguous pages. Essentially taking low bits of the VPN out of the hash function, then using them to index into the retrieved bucket, which is now a small linear page table. The size of the linear table is called the "subblock factor" (I'll use the symbol F). Specifically, we now use hash lookup for page blocks of size (F * pagesize). Each page block is paged with a linear table in the hash bucket. Second contribution: integrating TLB support for superpages and (partial) subblocking into all types of page tables. Definitions: TLB = translation lookaside buffer, associative cache for fast lookup of PTE given VPN. Superpage = pages with a power-of-two multiple of base page size. Superpages are contiguous (within a superpage, consecutive virtual address = consecutive physical address) and aligned. TLB entry has a single VPN tag, single valid bit, single PPN for the whole superpage. Complete subblocking = Single VPN tag, multiple valid bits and PPNs (one per page). The pages of the block don't need to be contiguous physical pages (PPN kept for each block) and don't need to be in memory all at once (multiple valid bits). Partial subblocking = Single VPN tag, single PPN, multiple valid bits. Since there is only one PPN, the pages of the block must be contiguous physical pages. They do not need to be in memory all at once. Support for TLB extensions in regular page tables: Replicate PTEs (feasible), Multiple page tables (less feasible). Support for TLB extensions in clustered tables: Buckets for partial subblocking and superpages only contain the needed information, as described above. Lookup for partial subblocking: PTE has new "S" (subblock) bit. If it's on, and the VPN tag matches, the bucket offset is ignored and the single PPN is returned (Q: where does the bucket offset's bits get used?). Lookup for superpages <= page block size: Always lookup as if the superpage size is == page block size. Proceed same as for partial subblocking, except must continue down hash chain even if invalid entries found. Lookup for superpages > page block size: replicate clustered PTE or multiple page tables. Interesting points: 1. The boundary cases of the clustered hash table reduce to hashed and linear tables. With subblocking factor 1, it is exactly a hashed page table -- 1 PTE/bucket and no bits taken away from the hash function. With subblocking factor=2^64, it is a linear table -- only one giant bucket with a giant table, and all the VPN bits are used to index into it. 2. Choosing the subblock factor. Interaction with caching, partial subblocking.