Operating system support for database management Stonebraker 1981 Summary by Ed Swierk This is a short and still-relevant paper critiquing five areas of operating system support (or lack thereof) for DBMSes. 1. Buffer pool management * LRU replacement is bad LRU exploits locality of reference, but most DB access (like sequential access to a large number of blocks) does not exhibit locality. So, DBMSes need some way to control the buffer management policy, or else implement their own buffer managers. * Prefetch is useless The next logical block does not necessarily map to the next disk block. So at best, OS prefetching doesn't do any harm, and at worst, slows down the system by prefetching useless blocks. * Control over buffer flushing is needed When a transaction commits, the DBMS has to ensure that updates hit the disk before the "I'm committed" flag does, for correct crash recovery. 2. File system The old Unix file system was particularly bad for DB storage because file blocks were often scattered over the disk. (The Unix fast file system tried to fix this.) DBMSes would prefer an "extent based file system" where large contiguous blocks of disk space can be allocated all at once. Tree-structured file systems are largely useless, because the DBMS has to store its own hierarchical data structures anyway (like B-trees). The author suggests that the OS should implement a "record management system" at the lower layer, allowing both DBMSes and traditional byte-array file systems to be built on top of it. Methinks he underestimated the complexity vs usefulness of putting this functionality into the OS. 3. Scheduling, process management, and interprocess communication Two ways to structure a DBMS: - several independent processes that share a common buffer pool and lock table, one per user/client/application - one server process, accepting requests from many users The former structure has problems: contention for locks on shared buffers can cause process convoys; process switching on every I/O. The paper claims that the latter structure is hard because OS IPC services (like Unix pipes) are "incompatible with the notion of a server process." I'm not sure why this is (was) so. Also, having the server schedule and multitask requests "involves a painful duplication of operating system facilities." (Of course if operating system facilities were exposed to DBMSes, they'd whine that they aren't good enough :^) Kernel-level threads seem to be the (partial) answer here. 4. Consistency control Most OSes provide locking only at the file level but not at a finer granularity. Better locking facilities are sorely needed. As mentioned in the section on file systems, concurrency control and support for transactions could be implemented in the OS. But the DBMS still needs to know about transactions so it can know when to flush its (user-level) buffers. 5. Paged virtual memory Many of the same issues from file buffering apply here. Mapping files directly into the address space is useful, but a lot of page table space is wasted for very large files consisting of many contiguous pages. An extent-based file system would support a much more compact representation. Conclusion His conclusion is that then-current OSes provide inappropriate or unnecessary services for DBMSes, and that future OSes should be more sensitive to their needs. (Boo hoo. Go write yer own OS :^) He points to real-time OSes as a good example of OSes that provide minimal services, allowing DBMSes to do what they want with low overhead. He hopes that a future OS will combine these minimal services with those offered by general-purpose, "all things to all people" OSes. Would the result be all things to even more people?