Back to index
R-Trees: A Dynamic Index Structure for Spatial Searching
Antonin Guttman, UCB
One-line summary:
R-trees (index structure optimized for spatial searching of "geographic"
data) are B-trees in which interior-node keys are k-dimensional
rectangloids rather than scalars, and the comparison test is "overlaps"
rather than linear range comparison.
Overview/Main Points
- Each leaf node's rectangle is the smallest rectangle that
encloses the leaf's data object.
- Each interior node's rectangle is the smallest one that encloses
all the rectangles in its children.
- Search comparison test is "overlapping", so more than one subtree
under a node may need to be visited during a search; so
worst-case performance is not log but linear in number of keys.
- Insertion:
- add leaf to the parent whose rectangle must be
enlarged by the least amount (break ties using area) to
accommodate it
- go up the tree adjusting the enclosing rectangles
- when nodes must be split (overfull): see below
- Deletion: may result in deletion/merge of nodes. Underfull nodes
are merged with whichever sibling will have its area increased
least.
- Node merging is better than redistributing the orphans to many
siblings because:
- the insert procedure can be called to do
orphan insertion, and the nodes it will visit are likely to have
been visited during preceding search (that caused the merge);
- reinsertion preserves the spatial structure ("spatial balance")
of the tree, preventing deterioriation that might occur if a
given child is forced to never be reparented.
- Node splitting: want to minimize chance that both new
nodes will need to be visited on future searches => minimize
total area of the two covering rectangles.
- Exhaustive alg. must examine exponential number of choices;
authors propose 2 heuristics:
- repeatedly pick 2
entries that would waste the most space if they were put in the
same group (examines O(M2) pairs)
- examine entries with highest low side and lowest
high side, and pick pair with largest
separation (normalized along corresponding "width"
dimension), which examines O(M) pairs.
- Experimental observations
- node splitting is responsible for only
a small part of insert cost in practice;
- Quadratic algorithm obviously has higher overhead but leads to
more balanced ("geographically tighter") R-trees;
- R-trees are good at quickly narrowing geographic search.
Relevance
Illustra.
Flaws
- No theoretical analysis of worst/expected case performance of
insert/delete (update is just delete and reinsert)
- No analysis of expected performance of heuristics compared to
optimal alg.
- Page sizes tried up to 2KB; would quadratic split algorithm be
much worse on today's 8KB-16KB or larger pages?
Back to index