Design and Implementation of a Log-Structured File System ========================================================= Summary: The paper presents a log-structured file system that performs well under loads characterized by small-sized files and on systems with large main memory file caches. Motivation ---------- - Faster CPU's make computations disk bound. - Disk seek times improving very slowly => Random I/O expense increasing. - Large main memory allows for large file caches which satisfy most reads => (a) most disk traffic is writes, and (b) writes can be merged before sent to disk. - Office and workstation workloads dominated by small files. All of the above taken together call for the design of a log structured file system. Problems with FFS ----------------- - Scatters information of small files (inodes, indirect blocks and data blocks) all over the disk. No spatial locality. - Writes that update metadata are synchronous. This defeats the potential use of a file cache for coalescing writes. LFS takes care of these problems by writing temporally related stuff together! A plus is that recovery is ultra fast! Details of LFS -------------- - The whole file system is the log! ["The network is the computer!"] - There are superblocks, inodes, indirect blocks and data blocks, as in FFS. - Since inodes are part of the log, an "inode-map" is maintained [level of indirection] that points to inodes inside the log. The inode map itself is written to the log. And pointers from a fixed "checkpoint region" point to inode-map blocks. - Parts of the log ought to be reclaimed as the corresponding blocks get updated/deleted => Cleaning is required. - LFS uses a combination of 'threading' and 'copying' [see page 4] for cleaning. Disk is split into segments which are threaded. And segments are 'cleaned' [read segments into memory, identify live data, write out live data, marking unused blocks as free]. - As blocks move around inside segments, pointers pointing to these blocks need be updated (inside inodes and indirect blocks). Moreover, while cleaning, we ought to be able to distinguish between live and dead blocks. To this end, a "segment summary block" is maintained. It keeps the information for identifying each block [e.g., for data blocks, it maintains the file id and block id]. - Segment cleaning policies - When to clean? - How many to clean? - Which ones to clean? - How should live data in segments be regrouped? - Sprite LFS starts cleaning when the # of clean segments falls below a threshold. It cleans as many as are required for this number to reach another threshold. - Which ones to clean? The solution is non-intuitive. Cleaning cost increases as segment usage increases (because live data has to be read and rewritten). Ideally, we should have a bimodal distribution so that the cleaner works on relatively empty segments. One solution is suggested by the authors: We should clean almost empty blocks that contain "hot" data. and almost-full blocks that contain "cold" data. The paper contains an interesting discussion as to why this policy works. This policy requires that a "segment usage table" be maintained with each segment. - Crash recovery is straightforward. Checkpointing is done at regular intervals (and on unmounting and system shutdown). Recovery is simple: Read the log from the last checkpoint and redo. Checkpoints are mirrored (to withstand corruption during checkpointing itself). - Performance is good for workloads dominated by small files! The only load for which performance deteriorates is when files are read sequentially after having been written randomly. Recovery time really small! Questions --------- - When to clean? Is cleaning a big overhead? - Are the assumptions of LFS justified?