The Sprite Network Operating System

Ousterhout87

 

Motivation / Goals

 

Capitalize on three technology trends:

 

Fast local networks – take advantage of communication to make a set of machines look like one (network transparency) big time-shared machine

 

Large memories – do lots of file caching

 

Multiprocessors – build OS to take advantage of them (authors believed every desktop would become a multiprocessor)

 

 

Approach: Keep OS interface same (BSD UNIX), but modify implementation to address above goals.

 

However, added the following to the OS interface & impl to support sharing:

 

Sprite kernel was monolithic (about 100,000 lines of code; half comments; still big for a kernel; impl in C)

 

Implementation:

* Supported kernel-to-kernel RPC (only used by kernels; OS is not message passing based, but is kernel call based; contrast with Mach, V)

* Prefix tables (supported dynamic reconfiguration of mount points on distributed FS)

* File caches on servers and clients

* VM system uses file for backing storage

* Processes behave the same whether migrated or not

 

File System

 

Network transparent (system calls for manipulating files are same regardless of where file is on the network)

Name transparent (name of file is same regardless of which machine on the network it is actually stored on; can move files from one machine to another w/o changing name)

 

Can you have network transparency without name transparency?  (Yes.)

 

In Sprite, entire FS is shared, whereas in AFS/NFS, only parts of the FS are shared and machines can still have files on a local disk that no one else can access; not so in Sprite.

(This is used to help implement process migration.)

 

Both Sprite and LOCUS provide a single file hierarchy accessible from all workstations.

 

Shared Address Spaces

 

Implemented in OS in anticipation of multiprocessors.  Parent process could fork a child and specify if child should share data (static data + heap).  Code segment was shared read only.  Data segment was shared read-write.

 

Regarding memory sharing, Sprite provided calls that allowed processes to be put to sleep & awoken, but no locking primitives provided by the kernel.  Sync impl on top of test-and-set so that context switch to kernel does not need to be made for thread operations.

 

Process Migration

 

Calls forwarded to home machine when need to access I/O device on home machine?  Not necessarily.  Devices are just special files on the unified filesystem hierarchy... they should be accessible from any machine.  However, contrast approach w/ V where all devices anywhere on the network accessible via device server.

 

Processes that share and address space must be migrated together.  (Why?  Performance or correctness?  Both or neither?)

 

OS provides calls to migrate processes.  Pmake was only program rewritten to use them.

 

Migrated process still appears to be running on home machine (i.e., when user runs ps), but really is running elsewhere on an idle machine.  Transparency – results are same as if process was executed on home machine.

 

Kernel Structure

 

Kernel was multi-threaded—doesn’t have one big lock that everyone blocks on like in UNIX or Linux to execute kernel code at the same time.  Else multiple threads could only execute concurrently while is user mode.

 

RPC – used implicit acknowledgement (responses ACK requests) and fragmentation (used to ship large blocks of data; i.e. VM pages during process migration)

 

Prefix tables – (Prefix, Server, Token).  Table seached for longest pathname match during name resolution.  Either server returns a number corresponding to resolved file, or remaining part of path that needs to be resolved that is outside of its domain.  If given a pathname for which there is no prefix table entry, request is broadcasted so that server that manages that domain responds, and new prefix table entry is made.

 

Caches

 

Advantages of caches: reduce network communication, reduce load on servers.  Disadvantage: need to run consistency protocol.

 

Sprite gives better consistency than NFS: NFS files are opened, and client can see stale data until file is closed and reopened.  Sprite always gives client most recently updated blocks.

 

Sprite FS- block-based like NFS; entire files not cached.

 

Sequential write shared handled using version numbers.  On open, client notifies server, and uses its cached version if server does not tell it about a more recent version number.

When a client that is not the last writer opens a file, the server forces the last writer to flush any modified pages.

 

Concurrent write-sharing (two clients have file open at same time), client caching is disabled.

 

Disadvantages: bad performance for concurrent write sharing case.  clients must contact servers on opens.

 

This scheme only provides consistency; FS_Lock provided for synchronization.

 

FS Performance – few megabyte client cache gives within 1 – 12% performance of local disks.  Client caches reduce network traffic by 4x or more.  Client caching reduces server load by a factor of 2.  Each server could support about 50 clients compared with 10-20 clients for NFS.

 

Virtual Memory

 

Backing storage for each machine is on network file server. 

 

Advantages: Simplifies process migration, VM system can reuse existing FS syscalls, don’t have to preallocate a static amount of disk space for swap space.

 

Since servers have large caches, VM pages can be read from servers faster than accessing local disk due to fast networks and slow disks (seek time).

 

Sticky segments—don’t swap out program’s code when execution completes until it becomes least recently used segment; this allows new process to start up quickly if code is still in memory.

 

Double Caching

 

Since VM uses FS, pages could get doubly cached in FS cache and VM address space.  Careful implementation required.  Solution: VM system bypasses local file cache when reading/writing backing files.  Backing files are cached on servers. 

 

However, servers do not cache backing files for their own processes.

 

VM / FS Physical Memory sharing:  The amount of physical memory allocated to VM and FS can change dynamically in Sprite.  For processes that are memory intensive (simulations), most of memory gets used for VM pages.  For processes that are disk I/O intensive (databases), most of memory becomes a file cache.  This is impelemented as follows: whenever a page needs to get kicked out of physical memory, the oldest VM page is compared with the oldest file cache page, and the oldest out of either of them are thrown out.

 

They considered doing memory mapped IO, but decided against it because: 1) a more complicated cache consistency scheme would be required (???), 2) wanted to maintain I/O devices and files to be treated same, but mapping I/O devices to memory didn’t make much sense to them.

 

Process Migration

 

V: pre-copying; VM pages of process to be migrated are copied to target; then process is frozen, moved, and any modified VM pages are recopied before process is started on target

 

Accent: VM pages are left on old machine, and migrated process demand pages against home machine

 

LOCUS: code segment is reopened on target machine, (and I believe demand pages the data segments)

 

Sprite: dirty pages are sent to server, code segment is reused if it already exists on target machine, or is loaded from backing store server, and data is demand paged from backing store

 

Sprite advantages: requires less total data to be copied than V, does not require old machine to server page faults

Sprite disadvantages: process is frozen longer than in V or Accent.

 

Process migration criterion: transparency, time to kick process off old machine, time that process is frozen, time old machine is servicing pages after migration

 

Sprite: Machine specific calls in the migrated process are forwarded to home machine to achieve transparency. (done using kernel-to-kernel RPC)

V, Accent: since these are message-based systems, transparency is achieved by forwarding all messages from home machine to target machine.

 

More info in process migration paper.