Input/Output Optimization and Disk Architectures: A Survey

Smith81

Why optimize? 1) Access Gap—ratio of access time main memory vs. disk on the order of 1:10000 or 1:100000, 2) At time paper was written, I/O operations required the attention of the CPU, and took a long time (they were not asynchronous)

à Frequency and cost of I/Os needs to be minimized.

Block Size:

Small blocks:

+ require small buffers and can be transferred quickly (good for small amount of data)

+ less internal fragmentation (free space within blocks)

- creates more gaps on disk and results in less space utilization

Large blocks: (overall better)

+ large, sequential reads can be done much more efficiently with large blocks

+ software overhead and latency of reading a block gets amortized over larger number of bytes read in.

- more internal fragmentation

Data set placement

put data sets that can be used concurrently on different disks to minimize seek times, and improve bandwidth.
put most frequently used files in center of cylinder

Faster disks should be given higher load and should store more data on them to increase throughput.

Arm scheduling: SSTF (shortest seek time first), SCAN (elevator algorithm), CSCAN (one-way elevator algorithm). At time of writing, authors observed that only 30% - 40% of disk requests required seeks, and the choice of arm scheduling algorithm was arbitrary since disk request queue length was small (1 or 0) most of the time, and disk arm seldom has to move. Also, for each disk, only one file would typically be open at a time per disk. This has clearly changed these days, and I think most disks use the elevator algorithm.

Rotational scheduling: Skip Sector Allocation: place sectors on disk 1, 6, 2, 7, 3, 8, 4, 9, 5 so that after 1 is read, the 2 isn’t missed because of time it takes to have first read finish and second read issue. Skip sector allocation will ensure that 2 will be under the disk head only after the second read issues.

Track offset for head switching: due to head switching time, offset start track such that by the time the disk switches to using next head, we won’t miss the start of the track and have to pay a penalty of one revolution.

Folding: store a copy of data multiple times on the same track so that the earliest copy will get read. I can’t imagine that rotational latencies are so long that people would really do this.

Prefetching: prefetch