Reimplementing the Cedar File System Using Logging and Group Commit Robert Hagmann - 1987 ------------------------------------------------------------------- Summary: The Cedar file system was reimplemented to make it portable. In doing so, the redesign used logging and group commit to maintain the robustness of the original system while improving performance. Requirements: 1. Portability To make the system operate on commercially available disk hardware, the original file system's dependence on per-sector label fields (used to check data integrity) was removed. 2. Robust against software errors: memory smashes by other software and some internal bugs Leader page: There is a leader page for each file that is used for software checking. The per-sector labels used to accomplish the same thing, but that was only available on specialized disks. 3. Achieve high performance in normal cases of read, write, create, and delete operations Locality: The key to high performance is data locality. The new system coalesces all metadata into a single data structure called the file name table (implemented as a B-tree). This reduces the number of accesses for file operations. Group commit: Changes to the filename and leader pages are written to a log. Updates are grouped together for a log flush, which occurs at most every half second. A log produces better performance than synchronous writes of metadata because there are fewer writes and the writes have better locality. Page allocation: In the old file system, large files were often broken up by the large number of small files. The new file system addresses this by separating files into two sections, one for big files and one for small ones. These areas are only hints and can grow and shrink much like memory allocation heap and stack areas. 4. Robust against a hardware sector error The file name table is written twice, with every page being written on two different sectors. Since Cedar's fault model is that errors will occur one at a time and will damage at most two consecutive sectors, as long as replicated pages are not written to adjacent sectors, no single disk write can damage both copies. 5. Fast recovery Log: During recovery, the redo log of metadata updates is the is scanned to finish flushing out any unwritten file name table or leader pages. With a consistent file name table, the free page bitmap (VAM) can be reconstructed if necessary by a simle scan of the table; since this table is a compact structure with a great deal of locality, this can be done quickly. Results: On a 300 Mb disk, the new file system achieved a worst-case recovery in about 25 seconds. In comparison, the old system, which did not perform atomic metadata updates required an hour or more to perform the recovery process of scavenging. Discussion: 1. Metadata updates: What are consequences of improving performance by making metadata updates neither logged nor synchronously updated as in Linux? 2. Although logging may not be cost-effective for data of a file system, it is effective for the metadata. Why?