Back to index
Inclusion of New Types in Relational Database Systems
Mike Stonebraker
Summary by: Armando Fox and Steve Gribble
One-line summary: Describes which generalized API's are needed
to RDBMS innards in order to implement new datatypes; has implications
for access method definitions, concurrency control, logging/recovery,
query processing/optimization, and performance/security (of extension
code).
Overview/Main Points
- new type definition: New types are fixed-size blobs,
user-provided conversion routines
(located, e.g., in dynamically loaded library)
translate for input and output
- operators on new types
- access method implementation:
- e.g. generalize <, >,
<=, etc. for B-trees (like GIST)
- implement
generic calls: open(), close(), get-first(),
get-next(), get-unique(), delete(),
replace(). Ideally, make these "universal" so designer
only has to write back end, i.e. higher-level software
takes care of begin/end transaction and transaction mgmt.
- if logical logging, some access method calls may have to
participate in logging too, e.g. specify Undo and Redo
procedures for logging as in ARIES.
- Concurrency control: main scheduler can make locking "upcalls" to
new type module, which returns yes/no/abort for each request.
- Buffer management: need generic interface to buffer
manipulation, e.g. get(), fix(), unfix(), put()...
- query processing/optimization:
- type creator can give
estimates for number of tuples that match an
equality-comparison check, number of pages touched
when running such a query, max and min key values, etc.
- also give estimate of selectivity factor for matching both
"rel.field OP value" and
"rel1.field1 OP rel2.field2" (for joins), or pick
rule-of-thumb constant if unknown (as in Selinger et al. 81)
- Would really like "transparent interface" to transaction system,
rather than the hodgepodge of calls above.
- Security of loaded code: can run in a separate address
space, or interpret, but those are expensive (Or can
sandbox)
Relevance
First semi-formal discussion of how to extend RDBMS from fixed-type
systems; presumably led to Illustra work.
Flaws
- Hard to get everything right: I suspect RDBMSs are implicitly
optimized all over the place for "hardwired" relational operators
(<, >, etc.), datatypes that closely match machine-level
things (ints, floats, arrays of char), etc.
- By putting flexibility in the hands of users, you're also putting
performance burden in the hands of users. A pessimal extension
could quickly become the bottleneck (this is true in all
extensible systems and we're coming up against it in our class
project).
Back to index