R-Trees: A Dynamic Index Structure for Spatial Searching

Antonin Guttman, UCB

One-line summary: R-trees (index structure optimized for spatial searching of "geographic" data) are B-trees in which interior-node keys are k-dimensional rectangloids rather than scalars, and the comparison test is "overlaps" rather than linear range comparison.

Overview/Main Points

Each leaf node's rectangle is the smallest rectangle that encloses the leaf's data object.
Each interior node's rectangle is the smallest one that encloses all the rectangles in its children.
Search comparison test is "overlapping", so more than one subtree under a node may need to be visited during a search; so worst-case performance is not log but linear in number of keys.
Insertion:
- add leaf to the parent whose rectangle must be enlarged by the least amount (break ties using area) to accommodate it
- go up the tree adjusting the enclosing rectangles
- when nodes must be split (overfull): see below
Deletion: may result in deletion/merge of nodes. Underfull nodes are merged with whichever sibling will have its area increased least.
Node merging is better than redistributing the orphans to many siblings because:
- the insert procedure can be called to do orphan insertion, and the nodes it will visit are likely to have been visited during preceding search (that caused the merge);
- reinsertion preserves the spatial structure ("spatial balance") of the tree, preventing deterioriation that might occur if a given child is forced to never be reparented.
Node splitting: want to minimize chance that both new nodes will need to be visited on future searches => minimize total area of the two covering rectangles.
Exhaustive alg. must examine exponential number of choices; authors propose 2 heuristics:
- repeatedly pick 2 entries that would waste the most space if they were put in the same group (examines O(M²) pairs)
- examine entries with highest low side and lowest high side, and pick pair with largest separation (normalized along corresponding "width" dimension), which examines O(M) pairs.
Experimental observations
- node splitting is responsible for only a small part of insert cost in practice;
- Quadratic algorithm obviously has higher overhead but leads to more balanced ("geographically tighter") R-trees;
- R-trees are good at quickly narrowing geographic search.

Relevance

Illustra.

Flaws

No theoretical analysis of worst/expected case performance of insert/delete (update is just delete and reinsert)
No analysis of expected performance of heuristics compared to optimal alg.
Page sizes tried up to 2KB; would quadratic split algorithm be much worse on today's 8KB-16KB or larger pages?

Back to index