Back to index

Mariposa: a wide-area distributed DBMS

M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, A. Yu Summary by: Armando Fox and Steve Gribble

One-line summary: Coarsely-coupled, fully-autonomous nodes in a WADDBMS each have various fragments of a distributed DB, and they bid on performing pieces of distributed queries and on acquiring/selling fragments, based on their available resources and other local policy.

Overview/Main Points

problem and goals:
- LAN distributed DBMS's assume static allocation, single admin. structure & policy at all nodes, uniform HW/SW.
- WAN DBMS goals: scalability, data mobility, node autonomy, schema changes don't force global synchronization, configurable per-site policies.
- Argument: WADDBMS cannot meet these goals. E.g. cost-based optimizers break if a site can refuse to process subqueries; data movement hard to coordinate if sites autonomous; etc.
Mariposa architecture:
- Each query has a budget it can spend. BId curve B(t) expresses how much user is willing to pay to resolve query in time t.
- Each site tries to optimize its revenue via local policy.
- DB Fragments distributed across nodes; some replication, with copy holders maintaining freshness of their copies by contracting with other copy holders to get updates.
Life of a query:
- Parser parses SQL3 query and figures out where each necessary table fragment is, etc., by consulting metadata from nameserver (which is itself a biddable item).
- Fragmenter produces fragmented query plan, decomposed into strides (groups of ops that can proceed in parallel; think gmake -j).
- Broker gets bids on pieces of query plan and notifies bid winners.
- Coordinator oversees execution of query strides and collates results.
- Storage managers, brokers, and bidders coded in Rush, a forward-chaining rule language (on condition do action).
Bidding:
- (Expensive) bidding process: first phase, broker collects bids; second phase, notifies winners.
- (Cheap) purchase order protocol: broker sends each subquery to site that would be most likely to win bidding, based on broker's experience. Entails a risk that site may exceed budget in doing the query.
- Servers advertise willingness to perform services via nameservers' ad table. There are yellow pages, sale prices, coupons, and bulk purchase contracts, all analogous to their real-world counterparts.
Offering a bid:
- Naive strategy: billing rate is function of CPU and I/O resources.
- Optimization 1: billing rates maintained per-fragment, to allow sites to acquire fragments it wants and sell off fragments it doesn't want, and declines to bid on queries below a specific threshold for each fragment.
- Optimization 2: multiply bid by load average; gives supply/demand effect and "crude load balancing". (Ed.: doesn't this only work if load averages can be correlated across heterogeneous machines? Can they?)
- Optimization 3: always bid on a query that references a "hotlist" fragment that you want.
- Optimization 4: include network resources in bidding. Authors propose to use Tenet protocols and represent a bandwidth profile (bandwidth as function of time), to estimate how expensive it will be to move fragments around.
- Offer price for buying fragments is offset by the value of those fragments that would be evicted (if any) to make room for new fragments.
Name servers: brokers try local cache, then fall back to name server. Name servers price their data according to its freshness (quality); stale data may increase delay as client has to go to another nameserver.
Experiments:
- Experiments performed on lightly-loaded WAN (do these exist anymore?) to avoid interference from heavy daytime traffic. (Seems a serious blow for WADDBMS argument)
- Brokering process: 14 roundtrips to collet bids, 6 to record them, 2 to notify winners. Some parallelism due to threading, but still a long time on a congested network! Hopefully connections are long-lived.
- For uniform-CPU sites, query optimizers tend to differentiate plans based on cost of data movement.
- A single example, fairly unconvincing, was used to demonstrate that Mariposa finds a better plan than a traditional distributed cost-based optimizer.
- Claim: "Cost of moving the tables can be amortized over repeated execution of queries that require the same data." Maybe, but they didn't measure this benefit, and we're talking about moving several MBytes each time. The elapsed times they gave for query execution did not include network time to move data, which was a total of 82sec+820sec.

Relevance

Flaws

Stability of prices in this market? (No hysteresis or other damping, and everything happens in "real time")
Measurements and handwaving for network costs totally unconvincing; if they can't prove this is worthwhile under "realistic" network conditions, it undermines the whole case for WADDBMS (as opposed to replication and consistency control, Bayou-style)

Back to index