* VAXClusters: A Closely-Coupled Distributed System * Authors: N.P. Kronenberg, H.M. Levy, and W.D. Strecker * Field: OS/File Systems VAXClusters are "closely-coupled" -- has separated processors and message passing and independent OS/node, but also has close physical proximity, single security domain, centralized disk storage, and memory-to-memory block transfers b/t nodes. Goals: high availability, easy extensibility (scalability) Nodes are arranged in a star configuration with a central "star coupler" hub. Each node has a Communications Interconnect (CI) Port that performs arbitration, path selection (one of two paths to each node), and data transmission. Arbitration: Each CI Port has node-specific delay time; the port wins arbitration if the CI is quiet for its delay time. Under heavy load there are two delay times, one long and one short, and the nodes alternate. "Thus, under light loading the bus is contention driven and under heavy loading it is round robin." ACKs are performed immediately b/c they can be done faster than the shortest delay time. Data transfer types: datagrams (unreliable), messages (reliable), blocks (reliable, can be >4k). Blocks are simply contiguous data in a virtual address space. Blocks are packetized, and the data is copied directly by the CI Port from local memory to remote memory, avoiding OS-level copying. Mass storage provided with a CI Port message-passing interface to provide easy sharing (aka file/disk server). Lock manager, connection manager. Connection Manager handles node failures and other cluster transitions, and implements the quorum voting scheme. Lock manager provides distributed locking by giving responsibility for an entire hierarchy of locks to a particular node. Voting scheme with quorum (minimum system resources) to prevent network partitioning. If the quorum is not met, the cluster suspends. Questions: 1) Is an unsound centralized locking mechanism worth it? 2) Is the linear scaling possible because they only use a small number (12) of nodes? 3) What about the interrupt-based lock contention notification to let processes be "friendly"?