* File System Design for an NFS File Server Appliance (WAFL)
* Authors: Dave Hitz, James Lau, Michael Malcolm
* Field: OS/File Systems

Write-Anywhere File Layout (WAFL) is a special-purpose file system
designed for use with an NFS appliance file server.  Design constraints:
 1) Fast
 2) Large, dynamically growing file systems (disks are added)
 3) Support RAID
 4) Fast restart after crashes

Speed issues: NFS writes must be synchronous b/c it is a stateless
protocol, RAID uses "read-modify-write"(?) sequence to maintain
parity.

WAFL solution uses 1) Non-volatile memory (NVRAM) and 2)
write-anywhere file layout which enables 3) snapshots that speed up
recovery.

Write-anywhere file layout: the only fixed-location metadata is the
root inode; the inodes are stored in a file, the block-map file
(tracks free blocks), and the inode map (tracks free inodes).  An
inode may contain the file data, pointers to blocks containing the
file data, or further indirect blocks; all file data blocks are at the
same level of indirection.

Snapshots: Create a new root inode; modify blocks using copy-on-write
to maintain the previous snapshot's data (also modify block map).
This means that not only the file's data must be copied and then
modified, but also all indirect blocks and inode blocks up to the root
(if they have not been already modified since the last snapshot).
WAFL makes this efficient by grouping writes into episodes -- heavily
modified blocks are only written once per write episode.

Consistency and NVRAM: A consistency point is an unnamed snapshot.
All updates between consistency points are written to NVRAM.  Crash
recovery = find last consistency point, then rolled forward updates in
the log.  NVRAM is split into 2 parts, when 1/2 is filled then a
consistency point is scheduled and the other half of the NVRAM is
used.

Block Map file: A block is only free if it is not used by any
snapshot.  Therefore instead of a bit map specifying whether each
block is free, WAFL uses a bit vector that contains one bit per
snapshot; a block is free if each "use" bit is 0 in the vector
(i.e. not used in any snapshot).  # of snapshots limited by the size
of the bit vector used (32).

Discussion: 

It seems like LFS should be able to support a snapshot-like mechanism
because old blocks are not removed; new blocks are written, just as in
WAFL.  The only issue is the segment cleaner, which must now take care
to keep blocks belonging to files in old snapshots.  Is this mechanism
feasible?  How similar is WAFL to LFS -- it has grouped writes similar
to log-appending with "write-anywhere" layout of metadata, and
consistency points which are like checkpoints?

After each snapshot there is an additional cost for copy-on-write for
data blocks and metadata blocks up to the root -- is it reasonable to
assume this is amortized effectively across many updates?  Also
consider that snapshots are created every few (10) seconds for
consistency, and perhaps even faster if the NVRAM fills up with
requests.