The LOCUS Distributed Operating System Bruce Walker, et al. - 1983 -------------------------------------- Summary: LOCUS is a distributed OS designed to provide as much network transparency as possible. This paper describes the distributed file system, which supports replicated files, nested transactions, and continued operation under network partitions. Design goals: network transparency, reliabile data storage, high performance, and fault tolerance. File system overview: - Single tree structured naming hierarchy, which covers all objects in the filesystem on all machines. Names in this hierarchy are fully transparent. - Files can replicated to varying degrees and it is up to LOCUS to keep copies up to date. Replication: - Replication is essential for directories because to access a file, you have to be able to access all the directories in the back path. When replicating files, you trade read performance for update performance. Fortunately, the top of the hierarchy needs to be replicated more, but is seldomly updated, and the lower levels are updated more, but have less need to be replicated. - Each copy has a version vector so that even if there is temporary disconnections or faults, a user never receives an out-of-date copy of a file. File system mechanics: For each file, a node can take the role of one or more of the following system components: - using site (US): issues the request for a file - storage site (SS): where the file is stored - current synchronization site (CSS): receives 'open' requests from US, enforces global access synchronization policies, and selects the SS that the US should talk to; after the open call, the US talks directly to the SS until the file is closed. Atomic file commits are achieved using shadow pages: - Pages of a file are not modified in place, rather a new page is allocated and modifications are written to this 'shadow page.' - inode information is also copied to a new page where modifications are made to the file metadata. - To commit, the modified inode information is written to disk; to abort, this inode information is discarded. - Note: changes to two different copies of the same file is prevented by having the committing SS send a message to all other SS's for this file; these SS's must update their versions of the file before modifications are made . - - Recovery: - Claim: 'partitions' due to hardware or software are common and must be handled. - Philosophy: it is unnecessary and detrimental to forbit updates in all partitions except one (as is the case with majority consensus, weighted voting, etc.) - Observation: the probability of conflicting updates in separate partitions is low, so just allow updates (similar to Coda). Actual conflict detection is not discussed in this paper.