Operating system support for database management
Stonebraker
1981
Summary by Ed Swierk

This is a short and still-relevant paper critiquing five areas of
operating system support (or lack thereof) for DBMSes.


1. Buffer pool management

* LRU replacement is bad

LRU exploits locality of reference, but most DB access (like
sequential access to a large number of blocks) does not exhibit
locality.

So, DBMSes need some way to control the buffer management policy, or
else implement their own buffer managers.

* Prefetch is useless

The next logical block does not necessarily map to the next disk
block.  So at best, OS prefetching doesn't do any harm, and at worst,
slows down the system by prefetching useless blocks.

* Control over buffer flushing is needed

When a transaction commits, the DBMS has to ensure that updates hit
the disk before the "I'm committed" flag does, for correct crash
recovery.


2. File system

The old Unix file system was particularly bad for DB storage because
file blocks were often scattered over the disk.  (The Unix fast file
system tried to fix this.)  DBMSes would prefer an "extent based file
system" where large contiguous blocks of disk space can be allocated
all at once.

Tree-structured file systems are largely useless, because the DBMS has
to store its own hierarchical data structures anyway (like B-trees).

The author suggests that the OS should implement a "record management
system" at the lower layer, allowing both DBMSes and traditional
byte-array file systems to be built on top of it.  Methinks he
underestimated the complexity vs usefulness of putting this
functionality into the OS.


3. Scheduling, process management, and interprocess communication

Two ways to structure a DBMS:

- several independent processes that share a common buffer pool and
  lock table, one per user/client/application
- one server process, accepting requests from many users

The former structure has problems: contention for locks on shared
buffers can cause process convoys; process switching on every I/O.

The paper claims that the latter structure is hard because OS IPC
services (like Unix pipes) are "incompatible with the notion of a
server process."  I'm not sure why this is (was) so.

Also, having the server schedule and multitask requests "involves a
painful duplication of operating system facilities."  (Of course if
operating system facilities were exposed to DBMSes, they'd whine that
they aren't good enough :^)  Kernel-level threads seem to be the
(partial) answer here.


4. Consistency control

Most OSes provide locking only at the file level but not at a finer
granularity.  Better locking facilities are sorely needed.

As mentioned in the section on file systems, concurrency control and
support for transactions could be implemented in the OS.  But the DBMS
still needs to know about transactions so it can know when to flush
its (user-level) buffers.


5. Paged virtual memory

Many of the same issues from file buffering apply here.

Mapping files directly into the address space is useful, but a lot of
page table space is wasted for very large files consisting of many
contiguous pages.  An extent-based file system would support a much
more compact representation.


Conclusion

His conclusion is that then-current OSes provide inappropriate or
unnecessary services for DBMSes, and that future OSes should be more
sensitive to their needs.  (Boo hoo.  Go write yer own OS :^)

He points to real-time OSes as a good example of OSes that provide
minimal services, allowing DBMSes to do what they want with low
overhead.  He hopes that a future OS will combine these minimal
services with those offered by general-purpose, "all things to all
people" OSes.  Would the result be all things to even more people?