Input/Output Optimization and Disk Architectures: A Survey ---------------------------------------------------------- -- Alan J Smith (1981) Short Summary: As the title suggests, the paper is a survey on I/O. It gives brief introduction to the various optimizations possible in disk I/O and then examines the shortcomings of the disk architectures prevalent at that time. It suggests moving to the "new" fixed block disks instead of the CKD disks. ----- Processor and memory are becoming faster, but the disk seek times are still on the order of 8-10ms (they haven't changed much from the 80's though disk sizes and bandwidth have improved). So, I/O could become a performance bottleneck if it is not made efficient (recall Amdahl's law). In 1981, I/O also took up CPU time since the CPU was interrupted for every block transfer (no DMA I guess). Disk Usage patterns in 1978: 75% full; 67% sequential files occupying 39% of space; so small seq. files and large non-seq. files; block sizes were small (note that block sizes were user-controlled). Optimizations: - block size: small block size => small buffers and quick transfer, but large latency for large data transfer and disk space wastage due to interrecord gaps; large block size (2-4KB) seems to be better in practice. - Data placement: to decrease seek time, all the frequently used data can be placed near the center of the disk, and the least used near the inner and outer edges; data sets used concurrently can be placed on multiple volumes so that they can be accessed concurrently. Also, data should be arranged so that higher performance disks should get more load than lower performance disks. - Arm Scheduling: goal is minimize total seek time for a set of requests. Algorithms include Shortest Seek Time First (SSTF), SCAN (`elevator' like algorithm), and First Come First Served (FCFS). The author does not think that arm scheduling is necessary, since request queue lengths are small and the disk arm rarely has to move to fetch blocks. These assumptions might be invalid in today's systems. - Rotational Scheduling: Scheduling of requests for data on the same cylinder. More important for drums (what are drums?) than for tapes. Techniques include hardware implementation, and skip sector allocation (ie., storing logically sequential sectors on physically alternate sectors to account for sequential access of blocks -- this is explained in the Unix FFS paper.) - Lookahead fetch and optimal buffer space allocation - Compaction and Fragmentation: random fixed-size block allocation (as in UNIX) has problems of internal fragmentation and more importantly logically seq. blocks being randomly allocated on the disk causing poor performance. A good method to combat this is to allocate blocks in extents (ie groups of physically close blocks). However, this can cause external fragmentation. Solution is to split the extents if necessary (I think the Unix FFS paper borrows many ideas from this paper). Another solution to fragmentation is compaction. Though compaction is in general slow, frequent (daily) compaction might be fast enough. - I/O congestion and multiple paths: The author contends that I/O congestion might not occur only at the device but also at the disk controller. This is true if a single disk controller is controlling multiple disks. The paper suggests using multiple paths to reduce this. (Dont most disks have their own controllers?) - File Structure: The structure of the file system itself has an effect on the efficiency of accessing files. Also, some applications (such as databases) would need data structures optimal for their specific uses. - Other optimizations include making the frequently used data into several sets and distributing it across multiple devices, controllers, and channels to mimize congestion, and pinning frequently used blocks in main memory instead of fetching them all the time from the disk. - The author also suggests that devices used for paging should be separate from the file system to decrease congestion in VM. The paper also touches a little bit on I/O for real-time systems. Current (as of 1981) and forthcoming device architectures: - The current disk architecture is CKD (Count-key-data). CKD has variable size blocks, and each block has a count indicating the size of the count fields, key field and data field; a key which is used to search for a specific block; and the data field. There is a fixed gap between the three fields and between blocks. So smaller blocks tend to waste more space. Also, I/O is initiated using a low-level I/O instruction (ie not a DMA type instruction) by the CPU and the CPU is interrupted for every block transfer. - Disadvantages of CKD: the low-level instructions do not permit much optimization; variable length records cause difficulty allocating buffers, and in computing the actual physical disk addresses from logical block addresses. The low-level instructions waste CPU cycles -- higher level DMA type instructions might be better. - Fixed sector (FS) disks: These disks have fixed size blocks and solve some of the problems with the CKD disks. multiblock transfers are easy to do with FS disks, and higher-level commands which do not need CPU intervention for every block transfer are present. buffer cache management is simplified, and DMA is possible with FS disks. - Other device architectures which were upcoming during that time include electronic drums which are made of MOS RAM chips made by Intel and Storage Technology Corp. These devices did not seem very costly at that time and their performance is much better than hard disks. The author was speculating that these devices might become popular, but it seems that they did not. - Finally, the author makes an argument for intelligent controllers and channels (DMA), faster I/O bus transfer speeds, additional disk and OS buffering, parallelism in disk accesses, and larger data block sizes.