The Clearinghouse: A Decentralized Agent for Locating Named Objects in a Distributed Environment

Derek C. Oppen and Yogen K. Dalal

One-line summary: The Clearinghouse is a decentralized set of processes that provides an efficient but not terribly robust method for a distributed name service.

Overview/Main Points

Name Spaces: Name spaces can be described in terms of directed graphs - a node is an object, and an edge is part of an object's name. An absolute name space has a root node with one edge leading to each other (unlabelled) node; all edges have unique labels. A relative name space has unlabeled nodes and labeled edges, and there is either zero or one uniqeuly labeled edge from any node to any other. The concept of hierarchy is orthogonal to that of flat (absolute) vs relative; a hierarchical name space's graph is partitioned into subgraphs; absolute naming must be used within leaf subgraphs, but either relative or absolute can be used between subgraphs. Flat spaces provide one-to-one mappings from names to objects. Relative spaces provide one-to-many mappings (many nodes may have the same name - uniqueness requires qualification with a source node.) Aliases (alternate names) mean many-to-one or many-to-many mappings are possible.
Clearinghouse names: Names are of the form L:D:O, where L is a local name, D is a domain name, and O is the organization. All objects (including clearinghouses and users) have such names. This namespace is therefore absolute and hierarchical with 3 levels. Usernames are made unique by providing a full person's name and a "birthmark" to resolve conflicts. The clearinghouse maps names into sets of properties; a property is a (PropertyName, PropertyType, PropertyValue) tuple. Types are either items or groups, where a group is a set of names.
Client's perspective: Client apps talk to clearinghouse stubs, which contain pointers to clearinghouse servers. Clients may request the value of a binding, update the value of a binding, create a new binding, or delete a binding. Each of these 4 operations is atomic at a clearinghouse but not across the set of all clearinghouses; temporary inconsistencies can then occur - information should be treated as a hint in the short term but truth in the long term.
Architecture: Each domain D in each organization O has one or more clearing house servers; each such domain server has a unique name (anything):O:CHServers. The set of all domain servers for D:O is accessed with name D:O:CHServers and PropertyType "Distribution List". Each organization also has one or more organization clearinghouse named (anything):CHServers:CHServers; such organization clearinghouses contian the name and address of every domain clearinghouse in the organization. Thus, O:CHServers is a domain used for naming clearinghouses, whose domain clearinghouse is the organization's clearinghouse. O:CHServers:CHservers maps into the set of names of organization clearinghouses for O.
Lookup: Client stubs contact known clearinghouses. If the request is for an object in that clearinghouse's domain, it returns the answer. If not, that clearinghouse returns a list of organization clearinghouses for the object. The stub then contacts one of these to obtain a domain clearinghouse for the object, and contacts that for the object itself. Lookup in the worst case is then 3 transactions. Sideways shortcuts between domains can be cached to avoid an up and down step. Domain clearinghouses can also cache bindings for common objects not in their domains; cache consistency is not a problem, since clients are responsible for detecting incorrect values and deleting them!
Update: Update requests are submitted through stubs to a domain clearinghouse for the object, which forward the update to all sibling domain clearinghouses. Updates are only atomic within a clearinghouse; out-of-order updates across siblings are discovered and resolved with an unspecified algorithm.
Authentication: is used. (Details are omitted in the paper.)

Relevance

The clearinghouse is a distributed, somewhat fault tolerant, and relatively efficient distributed naming scheme. This paper identifies the issues, and solves some of them well. It certainly takes a pragmatic rather than rigorous perspective on the problem.

Flaws

Some issues are swept under the rug, such as detecting and recovering from failures, network partitions, hostile or untrustworthy servers, detecting and recovering from update or deletion conflicts across name servers, providing cache consistency, etc.
The paper introduces name spaces overly rigorously and solves problems unrigorously.

Back to index