Back to index
AlphaSort: A RISC Machine Sort
Chris Nyberg, Tom Barclar, Zarka Cvetanovic, Jim Gray, Dave Lomet
Summary by: Steve Gribble and Armando Fox
One-line summary:
AlphaSort is a DataMation sort winner that optimizes the memory hierarchy
(stays on the processor cache as much as possible) and uses file striping
across disks to eliminate the reading-in and writing-out phase overheads of
the sort; MinuteSort and DollarSort are recommended.
Overview/Main Points
- Techniques:
- minimize cache-miss waits by:
- QuickSort input record groups as they arrive
from disk - QuickSort has good cache
locality.
- sort (key-prefix, pointer) pairs rather than
records or pointers - reduces data movement.
- Merge QuickSort runs using
replacement-selection tree. Since the merge
tree is small, you get good cache behaviour.
- Comparison of 3 variants of quicksort. Key sort wins -
especially if aline key/pointer on one cache line.
- record sort - record array sorted in place;
much data movement
- pointer sort - array of pointers is sorted;
many dereferences and random memory
references
- key sort - array of record-key,
record-pointer pairs is sorted. Comparison
operators just examine keys in array.
- shared-memory multiprocessor - one process per
processor. Right away, run through VM to get pages
allocated - VMS's zeroing of allocated pages is nasty
slowdown. Do in parallel with opening/reading of
files.
- disk bottleneck
- must stripe, or die.
- one or two passes? (i.e. keep first pass in
memory, or write to disk and do second pass
on disk?) Tradeoff is cost of memory
vs. speed/cost of parallel array of disks.
Memory/onepass wins up to gigabytes of data.
- software striping needed to get big enough
array of disks so that reading in file is not
killer overhead.
- recommendations:
- minutesort: sort as much as you can in one minute. Two
metrics are size (bytes), and price-performance
($/sorted GB).
- dollarsort: sort as much as you can for a dollar.
Metrics: size (bytes), elapsed time (ms). Count 3-year
maintenance price, normalize to time of sort.
- daytona (stock sort) vs. indy (do whatever you want,
e.g. NOWSort).
Relevance
Somewhat relevant - metric/benchmark for sorting. Mostly a competition to
see who can do coolest optimization hacks. Dollarsort/minutesort are
clearly more reasonable than Datamation sort.
Flaws
- sorting is cool and all, and may be somewhat of an indication of
system design worth, but all this attention is stretching its
utility, IMHO.
Back to index