* File System Design for an NFS File Server Appliance (WAFL) * Authors: Dave Hitz, James Lau, Michael Malcolm * Field: OS/File Systems Write-Anywhere File Layout (WAFL) is a special-purpose file system designed for use with an NFS appliance file server. Design constraints: 1) Fast 2) Large, dynamically growing file systems (disks are added) 3) Support RAID 4) Fast restart after crashes Speed issues: NFS writes must be synchronous b/c it is a stateless protocol, RAID uses "read-modify-write"(?) sequence to maintain parity. WAFL solution uses 1) Non-volatile memory (NVRAM) and 2) write-anywhere file layout which enables 3) snapshots that speed up recovery. Write-anywhere file layout: the only fixed-location metadata is the root inode; the inodes are stored in a file, the block-map file (tracks free blocks), and the inode map (tracks free inodes). An inode may contain the file data, pointers to blocks containing the file data, or further indirect blocks; all file data blocks are at the same level of indirection. Snapshots: Create a new root inode; modify blocks using copy-on-write to maintain the previous snapshot's data (also modify block map). This means that not only the file's data must be copied and then modified, but also all indirect blocks and inode blocks up to the root (if they have not been already modified since the last snapshot). WAFL makes this efficient by grouping writes into episodes -- heavily modified blocks are only written once per write episode. Consistency and NVRAM: A consistency point is an unnamed snapshot. All updates between consistency points are written to NVRAM. Crash recovery = find last consistency point, then rolled forward updates in the log. NVRAM is split into 2 parts, when 1/2 is filled then a consistency point is scheduled and the other half of the NVRAM is used. Block Map file: A block is only free if it is not used by any snapshot. Therefore instead of a bit map specifying whether each block is free, WAFL uses a bit vector that contains one bit per snapshot; a block is free if each "use" bit is 0 in the vector (i.e. not used in any snapshot). # of snapshots limited by the size of the bit vector used (32). Discussion: It seems like LFS should be able to support a snapshot-like mechanism because old blocks are not removed; new blocks are written, just as in WAFL. The only issue is the segment cleaner, which must now take care to keep blocks belonging to files in old snapshots. Is this mechanism feasible? How similar is WAFL to LFS -- it has grouped writes similar to log-appending with "write-anywhere" layout of metadata, and consistency points which are like checkpoints? After each snapshot there is an additional cost for copy-on-write for data blocks and metadata blocks up to the root -- is it reasonable to assume this is amortized effectively across many updates? Also consider that snapshots are created every few (10) seconds for consistency, and perhaps even faster if the NVRAM fills up with requests.