Back to index
Mariposa: a wide-area distributed DBMS
M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell,
C. Staelin, A. Yu
Summary by: Armando Fox and Steve Gribble
One-line summary: Coarsely-coupled,
fully-autonomous nodes in a WADDBMS each have various fragments of a
distributed DB, and they bid on performing pieces of distributed queries and on
acquiring/selling fragments, based on their available resources and
other local policy.
Overview/Main Points
- problem and goals:
- LAN distributed DBMS's assume static allocation, single
admin. structure & policy at all nodes, uniform HW/SW.
- WAN DBMS goals: scalability, data mobility, node autonomy, schema
changes don't force
global synchronization, configurable per-site policies.
- Argument: WADDBMS cannot meet these goals.
E.g. cost-based optimizers break if a site can refuse to
process subqueries; data movement hard to coordinate if
sites autonomous; etc.
- Mariposa architecture:
- Each query has a budget it can spend. BId
curve B(t) expresses how much user is willing to pay
to resolve query in time t.
- Each site tries to optimize its revenue via local policy.
- DB Fragments distributed across nodes; some replication,
with copy holders maintaining freshness of their copies by
contracting with other copy holders to get updates.
- Life of a query:
- Parser parses SQL3 query and figures out where each
necessary table fragment is, etc., by consulting metadata
from nameserver (which is itself a biddable item).
- Fragmenter produces fragmented query plan,
decomposed into strides (groups of ops that can
proceed in parallel; think gmake -j).
- Broker gets bids on pieces of query plan and
notifies bid winners.
- Coordinator oversees execution of query strides and
collates results.
- Storage managers, brokers, and bidders coded in Rush, a
forward-chaining rule language (on condition
do action).
- Bidding:
- (Expensive) bidding process: first phase, broker collects
bids; second phase, notifies winners.
- (Cheap) purchase order protocol: broker sends each
subquery to site that would be most likely to win bidding,
based on broker's experience. Entails a risk that site
may exceed budget in doing the query.
- Servers advertise willingness to perform services
via nameservers' ad table. There are yellow pages,
sale prices, coupons, and bulk purchase contracts, all
analogous to their real-world counterparts.
- Offering a bid:
- Naive strategy: billing rate is function of CPU and I/O
resources.
- Optimization 1: billing rates maintained per-fragment, to
allow sites to acquire fragments it wants and sell off
fragments it doesn't want, and declines to bid on queries
below a specific threshold for each fragment.
- Optimization 2: multiply bid by load average; gives
supply/demand effect and "crude load balancing". (Ed.:
doesn't this only work if load averages can be correlated
across heterogeneous machines? Can they?)
- Optimization 3: always bid on a query that references a
"hotlist" fragment that you want.
- Optimization 4: include network resources in bidding.
Authors propose to use Tenet protocols and represent a
bandwidth profile (bandwidth as function of time),
to estimate how expensive it will be to move fragments
around.
- Offer price for buying fragments is offset by the value of
those fragments that would be evicted (if any) to make
room for new fragments.
- Name servers: brokers try local cache, then fall back to name
server. Name servers price their data according to its freshness
(quality); stale data may increase delay as client has to go to
another nameserver.
- Experiments:
- Experiments performed on lightly-loaded WAN (do these exist
anymore?) to avoid interference from heavy daytime traffic.
(Seems a serious blow for WADDBMS argument)
- Brokering process: 14 roundtrips to collet bids, 6 to
record them, 2 to notify winners. Some parallelism due to
threading, but still a long time on a congested network!
Hopefully connections are long-lived.
- For uniform-CPU sites, query optimizers tend to
differentiate plans based on cost of data movement.
- A single example, fairly unconvincing, was used to
demonstrate that Mariposa finds a better plan than a
traditional distributed cost-based optimizer.
- Claim: "Cost of moving the tables can be amortized over
repeated execution of queries that require the same data."
Maybe, but they didn't measure this benefit, and we're
talking about moving several MBytes each time. The
elapsed times they gave for query execution did not
include network time to move data, which was a total
of 82sec+820sec.
Relevance
Flaws
- Stability of prices in this market? (No hysteresis or other
damping, and everything happens in "real time")
- Measurements and handwaving for network costs totally
unconvincing; if they can't prove this is worthwhile under
"realistic" network conditions, it undermines the whole case for WADDBMS
(as opposed to replication and consistency control, Bayou-style)
Back to index