Th TSIMMIS Approach to Mediation: Data Models and Languages

Hector Garcia-Molina, Yannis Papakonstantinou, et al.

One-line summary: Support for integrating "related but not-quite-the-same" information sources. Desired results are expressed as queries over semistructured self-describing data; query normalization can be used to compute the answers to queries that are not directly supported; wrappers around sources are used for homogeneity.

Overview/Main Points

Goal: integrate query-like, service-like, etc. info resources, and provide a query interface (note, not a service I/F), by wrapping each of the sources. A query can be made against a wrapped source or another mediator, so the potential for composition seems to exist, though this paper doesn't go into any detail about it.
Wrapper type system: self-describing objects, which may be atomic or set-valued. The atomic types are few and simple. Webc's type system is even simpler.
The logic-based (datalog-like) Mediator Specification Language expresses queries that capture the structure of the data, although no schema is imposed a priori. Lorel is the end-user query language.
TSIMMIS is designed to integrate "information from related but not-quite-the-same information sources." (Direct quote from beginning of sec. 5)
A Tsimmis wrapper decides whether a query is directly supported or can be indirectly supported by filtering a result; if neither of these is the case, query normalization can sometimes be used to generate strategies in which a query that is not direclty support can be answered by performing many queries that are supported. For example, if the only query supported is "parents of X", we can find X's grandparents by running two sets of queries and taking the union of the results. In general, the strategies produced may be expensive to find (NP-hard in general?) and expensive to execute.

Relevance

They emphasize what we call "semantically similar" sources: the desired result is expressed as a query in Lorel, and the necessary "compositions" are computed using query normalization. The fact that the sources are semantically similar to begin with is the reason a query language is a natural way to express "composition" implicitly. In our case, we don't really have a language (yet) for expressing the desired result, but we want to express "composition" of semantically-distinct sources using an explicit composition language (similar to CLAM?). Because the sources (services) are semantically distinct, the tasks we want to perform through automatic composition cannot be gracefully expressed in terms of a query.

Back to index