An Evaluation of Buffer Management Strategies for Relational Database Systems.

Hong-Tai Chou and David J. DeWitt

Summary by Steve Gribble and Armando Fox

One-line summary: A model of relational query behaviour (the query locality set model, or QLSM) is used to motivate a buffer management strategy called DBMIN; QLSM predicts future reference behaviour but separates this prediction from any particular buffer management algorithm.

Overview/Main Points

"Bad" buffer mgmt algorithms:
- Domain separation: pages are classified into types, each of which is separately managed in its associated domain of buffers by LRU discipline. Example type assignment: one domain to each non-leaf level of B-tree structure. Limitation is that concept of domain is static, and relative importance of types is not considered.
- "New" algorithm: buffer pool is subdivided and allocated on a per-relation basis. Each active relation assigned a resident set that is initially empty, resident sets of relations are linked in a priority list. MRU within each relation. Limitations: why MRU all the time? Why order rleations on priority list? Searching through list can be expensive.
- Hot Set: a set of pages over which there is a looping behaviour is called hot set. By plotting number of page faults vs. buffer size, observe discontinuities at "hot points". Each query is provided separate list of buffers managed by LRU discipline, number of buffers query is entitled is predicted by hot set model. Limitations: why LRU? MRU better in some situations; LRU will over-allocate memory.
Query Locality Set Model (QLSM): relational database systems support a limited set of operations, and page reference patterns exhibited by these operations are regular and predictable. Database operation can be decomposed into a number of simple reference patterns:
- Straight sequential (SS): one page frame is all that's required, as no page is ever revisited.
- Clustered sequential (CS): records in a cluster set (records with same key value) should be kept in memory at same time if possible.
- Looping sequential (LS): MRU.
- Independent random (IR): similar to SS.
- Clustered random (CR): similar to CS.
- Straight hierarchical (SH): index traversed only once, one page frame all that's necessary.
- Hierarchical with straight sequential (H/SS): index traversed once, followed by scan through leaves - similar to SS.
- Hierarchical with clustered sequential (H/CS): similar to CS.
- Looping hierarchical (LH): pages closer to the root are more likely to be accessed - access probability of ith level is inversely proportional to ith power of fan-out factor, so keep pages at an upper level in memory preferentially (root page only one worth keeping in memory for large fan-out).
DBMIN: buffers are allocated and managed on a per file instance basis. (A file instance seems to be a file opened by a query; if multiple queries open the same file, this is multiple file instances.) Set of buffered pages associated with a file instance is its locality set. Each localit set is managed separately by a discipline selected according to the intended usage of the file instance.
- If buffer contains a page that does not belong to any locality set, it is placed on global free list.
- Pages in the buffer can belong to at most one locality set. A file instance is considered the owner of all pages in its locality set.
- To allow for data sharing among concurrent queries, all buffers in memory are also accessible through a global buffer table.
- Algorithm: when page requested by a query, search is made to global table, followed by an adjustment to associated locality set. Three cases:
  1. Page found in both global table and locality set: only usage statistics need to be updated as dictated by local replacement policy.
  2. Page found in global table but not the file instance's locality set: if page has an owner, page is given to requesting query. Otherwise, page added to locality set of file instance. If this exceeds file instance's maximum number of buffers allocated to it, a page is released according to local replacement policy.
  3. Page is not in memory: it is brought in, then proceed as in case 2.
Performance: a hybrid trace/distribution driven simulation was used to compare DBMIN with random (RAND), FIFO, CLOCK, and working set (WS) algorithms. Metric of comparison was throughput (average number of queries completed per second).
- DBMIN wins in all cases, followed closely by HOT. CLOCK, WS, FIFO, and RAND don't do well at all.
- Data sharing increases competitiveness of other algorithms, but DBMIN and HOT still win. (Data sharing in three cases: full, in which all queries access the same database, half, in which every two queries share a copy of the database, and none, in which every query has its own copy of the database.)
- Load control (utilization of paging device is kept busy about half the time by descheduling thrashing queries) also made other algorithms more competitive, but there are well-known problems with load control. (High overhead, responds only after bad situation arises, doesn't work well in environment with large number of transactions.)

Relevance

Minimization of I/Os is key, and buffer management strategies is critical for this task.

Flaws

Need to know intended usage of file instance so that a management algorithm and required number of buffers can be allocated. Is it ever possible that intended usage cannot be predicted?
What if there are multiple users, and required number of buffers for all concurrent queries cannot be satisfied? Do you just delay some of the queries, or do you let all of them suffer?
Each file instance is considered independently. Is there some opportunity for optmization across file instances? Surely there is locality across queries, or across file accesses within a query.

Back to index