**One-line summary:**
Survey of various schemes for
mapping multidimensional data into a linear space, with investigation of
locality-preserving properties for column-based and range queries.

- Methods for mapping 2D to 1D:
- Column scan
- Snake scan: reverse direction of scan on alternate columns, so last element of column 2n and last element of column 2n+1 stay adjacent
- Z-curve scan: obtained by interleaving bits of binary representation X,Y coords; scanning order is LL, UL, LR, UR, recursively.
- Gray coding: like z-curve, but Gray-code to obtain 1D coord.
- Hilbert curve: see fig. 4 in paper; hard to describe but easy to see.

- "Goodness" metrics: number of disk blocks fetched, and number of non-consecutive block pairs in fetch.
- Query types: <*,value> or <value,*>, i.e. horiz. or vertical line in 2-space; or <Rx,Ry> where Ri is a range on dimension i, i.e. a rectangle in 2-space.
- expected number of continuous runs to
complete row or column query:
- column scan sucks except when a column is picked
- snake slightly better for horiz. lines near "top" or "bottom" rows;
- z-curve and Hilbert curve least affected by horiz. vs. vertical
- No "clear winner" among z, Hilbert, snake

- expected number of continuous runs to complete rectangle query: all do almost equally well, with snake doing a bit better! (see tables below)
- expected linear span of mapping a rectangle: O(perimeter) for all mappings
- So why use complicated z/Gray/Hilbert instead of simple snake mapping? Answer: if multiple 2D grid points can be mapped to single disk block, analysis changes.
- Experimental results ("simulations"? hard to say) show that "locality at every level" (my phrase) in z, Hilbert, Gray makes them do fewer I/O's to complete query than snake, with Hilbert and Gray usually doing best.
- Insight: for all except snake, number of hits per block to fetch
a selection is proportional to size of selected size, but
*independent*of size of database. Snake is always proportional to size of database since columns are scanned contiguously. - Same insight explains superior performance over snake w/r/t fraction of blocks read sequentially and hits per block.
- Extensions to multiple dimensions: 3D "primitives" for Hilbert and z/Gray are shown.
- Extensions to non-square regions: replicate ("tile") pattern to map the area, or have a primitive whose interpoint distances are non-invariant w/r/t rotation. E.g. fig 18, last page of paper.
- Previous work result:
*k*-attribute selection on database with*N*records has disk access cost O(N^{(k-1)/k}); but this paper shows that for specific classes of queries we can do much better, in fact O(k) independent of N for some cases.

- "Experimental results" - seems like they could have just been computed from first principles, if the distribution of request sizes and parameters was known or could be assumed.