Query Evaluation Techniques for Large Databases (5,8)

Goetz Graefe

Summary by: Steve Gribble and Armando Fox

One-line summary: Discussion of the implementation, performance and optimizations for various kinds of join algorithms, and a few words on query scheduling to minimize I/O cost.

Overview/Main Points

Join algorithms

Most systems today use nested-loop join or merge join, since System R found that these always gave the best or close to the best performance. Hash joins have recently been studied with some interest, though.
Nested-loop join: scan "inner" input once for each value of "outer" input, looking for a match. Some optimizations to alleviate the horrible performance:
- For one-to-one match, can terminate inner scan as soon as match found.
- Scan inner input once for each block (page) of outer input (block nested-loop join).
- Try to save some pages of inner input in memory.
- Scan inner input alternately backwards and forwards, to reuse the last page or two of inner input in memory.
Indexed nested loop join exploits a temporary or permanent index on the inner input's join attribute.
- Can be way fast under certain conditions (index depth * size of smaller input < size of larger input).
- "Zig-zag join" requires an index on both inputs; alternates scanning for a join value in one and then looking it up in the other.
Merge join: both inputs sorted on join attribute, then scanned by stepping a pair of pointers. Efficient if the inputs are produced by a previous merge-join step, since that means they're already sorted.
- Heap filter merge join: use temporary memory to create sorted runs from outer input using replacement selection; join immediately with sorted inner input; repeat.
Hash join: Build in-memory hash table on one input, probe it with items from the other.
- If inputs too large, they're recursively partitioned. Outputs from each partitioning are concatenated.
- Cost is roughly logarithmic in input size; main difference compared to merge join is that the recursion depth depends only on the build input, not both inputs.
Pointer-based join: Used to gather a set of tuples S pointed to by pointers from tuples in R.
- Nested-loops variant: scans through R and retrieves the appropriate S tuple for each R tuple. All the same performance problems as regular nested-loops join.
- Merge-join variant: sort R on pointer values, make a single pass over the disk to read all tuples pointed to.
- Hybrid hash join variant: collects together R tuples containing pointers to same S page, then scans.
- Pointer based joins typically outperform value joins if only a small fraction of the outer input actually participates in the join.
High order bit: good cost functions are needed so that the query optimizer can decide what flavor of join to use.

Optimal query scheduling

Used to be unimportant, since all execution plans were left-deep trees and sorting to disk was done at each intermediate step.
Today's plans include bushy (ie parallelizable) execution trees and extensive pipelining of successive steps, so scheduling matters.
Optimal buffer management: map disk pages to buffers so as to avoid the largest number of I/Os in the future.
- Cost functions exhibit "steps" when plotted over available buffer space; should allocated buffer at low end of a step.
- Separate page replacement algorithm for each scan allow finer control of buffer allocation.
Natural stopping points: points at which all data are in temporary/intermediate disk files. A good time to switch efficiently between independent subplans.
Hybrid-hash and similar binary match operations should allow overflow avoidance as a runtime option, to allow the query optimizer to insert a stop point.
Binary-operator implementations should have a switch controlling which subplan (left or right) is initiated first.
Memory should be proportionally divided among multiple concurrently-active operators.
For recursive hybrid hash joins, finish level N-1 before starting level N. Reason: if level N consumes results immediately as they are generated by N-1, you do get some pipelining but you also get the two levels competing for memory.
Allocation of resources other than memory "is an open issue that should be addressed soon".
Scheduling bushy trees for concurrent execution is not well understood. Something like Ingres decomposition is likely to yield better results in practice than trying to produce optimal scheduling. (Aren't most flavors of optimal scheduling NP-complete anyway?)

Relevance

I/O reduction is the name of the game; nested loop, merge, and hash join algorithms are I/O intensive, so optimizations for them (as well as optimal query scheduling, which matters now a lot more than it used to) are a bunch of hacks to reduce I/O costs.

Flaws

The author should summarize conclusions somewhere (at the beginning or the end). The sentence above sums up the whole point of these sections.
So boring to read that my teeth were falling asleep, but I guess that's not too surprising for a survey paper.

Back to index