Lightweight Remote Procedure Call Bershad, Anderson, Lazowska, Levy, ACM TOCS 8(1):37-55 (Feb.1990) LRPC combines * Fine-grained protection using capabilities (hard to implement): per-object protection domains, with all objects living in a single address space. Objects can only be touched via protected procedure calls, which transfer control into the object's domain. * RPC-like large-grained protection mechanisms, which were proven to work locally in Mach. Boundaries correspond to machine boundaries. Insight: Originally, RPC folks optimized for calls within the LAN as the common case. Key observation is that today the common case is embodid by cross- domain calls on the same machine (over 95% of the time). Simple class with small arguments/results (<= 50 bytes) are still the common case. LRPC aims to improve RPC for this new common case. Traditional RPC overhead is due to 7 factors: use of stubs, message copying access validation, scheduling abstract/concrete threads, context switches, and dispatching in the server. LRPC borrows the execution model from protected procedures, while maintaining the programming semantics and protection model of RPC. Binding: Each server domain has a clerk, which registers its service's interface with a name service. When a client binds to the interface, it issues an import call to the kernel, which passes it to the clerk, which in turn returns a procedure descriptor list to the kernel. This list contains for each procedure an entry address, number of simultaneous calls allowed, and size of its argument stack (A-stack). The kernel allocates the A-stacks shared between client and server, and a linkage record for each A-stack; then it returns to the client a capability called a binding object. Calling: The client's stub placves the args on the A-stack, places {A-stack BP, binding object, proc identifier} in registers, and traps to the kernel. If any args were passed by reference, the referents must be copied onto the A-stack, to protect servers from bad pointers. The kernel verifies everything, ensures the A-stack/linkage pair is not used by anyone else and pushes the linkage on the A-stack. Then it finds an execution stack (E-stack) in the server domain, pushes a pointer to the A-stack args on it, switches context to the server's domain, and does an upcall to the server's stub. After server procedure is done, is returns to the stub, which returns to the caller's domain. Note that this scheme allows clients or servers to asynchronously modify the A-stack after control has been transferred across domains. Stub generation: Most stubs are generated straight in assembly; this is simple because most of what they do are moves and traps. More complicated things (binding, exceptions, call failure) and marshalling of complex/ large arguments are handled by Modula2+ code. Performance hacks: On multiprocessors, LRPC does domain caching to reduce context switch over- head: if a processor is idle in a server domain, then the caller is switched to that processor when it enters the server domain (and the idle thread goes to the caller's processor). If, by the time the call is done, the idle thread switched into the client domain is still idle, the caller is returned to its previous processor. To make this an often occurrence, LRPC makes idle processors spin in domains known to have high activity. Performance could be further improved with PID-tagged TLBs. Miscellaneous performace tricks: * avoid locking shared data during call and return, to avoid contention. * use a bit in the binding object to indicate whether call should be local or remote; for remote calls, use traditional RPC stubs * a calling thread is allowed to issue multiple LRPC calls (using different A-stacks), unlike traditional RPC Domain termination: Domains can be terminated by the kernel without requiring that outstanding threads be synchronously terminating. However, the binding object (either as client or server) is revoked, so no more out-calls or in-calls can be made.