Join Processing in Database Systems with Large Main Memories

Leonard D. Shapiro

Summary by: Steve Gribble and Armando Fox

One-line summary: Four join algorithms are presented and analyzed (sort-merge, simple hashing, GRACE hashing, hybrid hashing), and hybrid hashing is shown to dominate the others in performance in most situations, assuming a large enough memory (order of square root of smaller relation).

Overview/Main Points

Four algorithms for "equijoin" are discussed: sort-merge (traditional), simple hashing, GRACE algorithm, and hybrid. Hybrid is shown to dominate all others, even if virtual memory is used. Assumptions: block size is full disk track, two relations used are R and S, memory is M, S is larger relation, and sizeof of M >= sqrt(sizeof S).
sort-merge:
- Scan S and produce output runs (using heap) - each run will be 2M blocks long. Do same for R. Assume S is larger relation - there will be at most sqrt(sizeof S) runs of R and S total after this phase.
- Allocate one block of memory for each run of R and S, merge runs of R and S concurrently, outputting tuples of (R,S) if the pair matches.
- If M < sqrt(sizeof S), then more than two merge phases will be needed, as won't be able to allocate one block for each run of R and S.
simple hashing:
- Choose a hash function h, and pick a set of hash values such that sizeof(M) tuples will hash into that set. Scan the smaller relation R, and if a tuple hashes into this set, insert the tuple into a hash table in memory, otherwise pass it over and write to disk.
- Scan the larger relation S, and if the tuple hashes into the range, check the hash table of R in memory for a match, and if found, output, else pass over tuple and write it to disk.
- Repeat these steps, using tuples that were passed over in the previous steps, until no passed over tuples are left.
GRACE:
- Choose hash function h and a partition of its hash values such that R is partitioned into sqrt(sizeof(R)) subsets of approximately equal size. Allocate sqrt(sizeof(R)) blocks of memory, each to be an output buffer for a subset of the partition of R.
- Scan R; using h, hash each tuple and place it in the appropriate output buffer. If output buffer fills, write it to disk. Flush all buffers to disk when R is completely scanned.
- Do similar scan for S.
- For each set Ri in the partitions (1 <=i<= sqrt(R)), and the corresponding Si,
  - read Ri into memory and build hash table for it. (Will fit in memory as have assumed at least sqrt(R) memory.)
  - hash each tuple of Si, probe for match in hash table. If match, output, otherwise discard the tuple.
- Works because partitions both relations on the join key, so if a tuple in R and a tuple in S match, they are both sent to the same partition by the hash function.
hybrid:
- Like GRACE, except we set aside some memory and build the hash table for the 1st partition while the relations are being scanned. This saves that partition from being written to and read from disk and extra time.
Hybrid dominates all others, as shown by an analytic proof of the algorithms' costs, again assuming sizeof(M) at least sqrt(S) for sort-merge or sqrt(R) for hash-based algorithms, and assuming that the relations are large enough. Numerical simulations confirm the analytic results.
We assumed that partitioning could be done perfectly, but this isn't necessarily so. If a partition overflows, the solution is to repartition and write some passed over tuples to disk for a later pass.
M can be virtual memory and not physical memory. Need a "hot-set" for each algorithm - a set of pages that are locked down in physical memory - essentially the minimum working set size that the algorithm needs to not thrash. Conclude that if page-aging (technique that replaces LRU page replacement policy, instead throwing out MRU) is possible, the sort-merge is unaffected by virtual memory, but performance of hybrid degrades to performance of GRACE. GRACE dominates sort-merge in virtual memory, so hybrid still dominates sort-merge as well.
Bit-filters can be used to speed up join (see Graefe paper). Also, semi-join can be used: construct projection of R on joining attribute - p(R). Join p(R) to S - result is semijoin. Then join this semijoin to R, and the result is equal to the join of R and S.

Relevance

Convincingly argues for the use of hash-based join processing in database systems. Considering that physical memory is even more in abundance than when this paper was written, the arguments have probably gotten stronger over time, although I wonder how much data set size (measured in number of tuples) has grown relative to physical memory size growth.

Flaws

The simulations performed in the paper were simulations of the analytical formulae derived for the algorithms' performance, and not simulations or measurements of implementations of the algorithms. This means any error or bogus assumption in the derivation of their analytic results would make the simulations bogus. I'd prefer to see measurements of implementations.
Although I believe that the assumption that sizeof(M) >= sqrt(R) is reasonable, I'd like to see some data from "real world" database installations to find out just how often this is really the case.

Back to index