The Rio File Cache: Surviving Operating System Crashes Peter M. Chen, et al. - 1996 ------------------------------------------------------ Summary: Rio uses a universal power supply and file cache protection mechanisms to make main memory survive OS crashes and thus achieve the performance of main memory with write-through reliability. Solutions to the performance/reliability tradeoff of memory systems: - Synchronous writes provide high reliability, but limits throughput although logging and group commit optimizations help. - Asynchronous writes allow more overlap between CPU and I/O time, but reliability suffers because one doesn't know when the data is on disk. - Delayed writes improve performance further because data can be deleted or overwritten before being flushed to disk. However, a lot of new data lives longer than 30 seconds and must be flushed. - A write-back scheme provides maximum performance but can only be used where reliability is not an issue (temp compiler files). - Rio's solution: For performance, use a write-back scheme to eliminate all reliability-induced writes to disk. For reliability, protect the file cache during crashes and restore it on reboots. Protection: - Key idea: The reason battery-backed memory is viewed as vulnerable during a crash yet view disk as protected is the interface used to access the two storage media. It's not easy to make an accidental call to the disk driver, but any store instruction can incorrectly change data in memory by simply using a wrong address. - Solution idea: Protect the file cache so that it is difficult to mistakenly write data. - Solution: Turn off write-permissions for file cache pages so that file cache procedures can check all writes. One complication is that in some systems, the kernel can bypass the VMS and directly access physical memory. One solution to this is to force all accesses to go through the TLB. A second solution involves inserting code before every store to check for correctness. Effects on file system design: - Reliability-induced writes to disk are no longer needed. - Metadata updates in the buffer cache must be as carefully ordered as those to disk, because buffer cache data is now permanent. - Memory's high throughput makes it feasible to guarantee atomicity when updating critical metadata. Results: - Reliability: Rio without protection is about as reliable as a write-through file system and even better with protection. - Performance: Almost equivalent to the optimal system where there are no writes to disk. Discussion: - How do file cache procedures check for accidental writes to file cache pages? - In Section 2.1, it is stated that Rio only protects from kernel crashes. Can Rio's ideas be used to protect faulty user programs from corrupting their files? - Results: Is there really enough data on reliability to be statistically significant? And how did Rio perform better than the memory-only file system?