Back to index
Transaction Management in the R* Distributed Database Management
System
C. Mohan, B. Lindsay, and R. Obermarck
Summary by: Steve Gribble and Armando Fox
One-line summary:
2-phase commit is the standard transaction management/distributed commit
protocol, but it has 4(N-1) message cost - optimized commit protocols
(presumed abort and presumed commit) are presented. (And, for something
completely different, distributed deadlock detection is mentioned in a
half-page.)
Relevance
2-phase commit is well-known, and potentiall extremely expensive to run.
These variations reduce the cost in the common case, which is a good thing
to do.
Flaws
- Hand-wavvy - I'm fairly convinced these algorithms are correct,
but I'd like to see a more formal demonstraction of correctness.
- Performance analysis - which is common case in distributed
databases? Need to know in order to predict which to use.
Overview/Main Points
- Have coordinator and subordinate nodes; subordinates only talk
to coordinator (not to each other).
- 2-Phase commit:
- Coordinator issues PREPARE message to begin
first phase.
- Each subordinate that is willing to let transaction
commit force-writes a prepare log record then
sends a YES VOTE. If not, any subordinate can
veto by issuing NO VOTE. A yes-vote locks the
subordinate into committing for all time.
- Once coordinator receives all votes, if unilaterally
YES, then force-writes a commit record and sends
COMMIT message to subordinates, else
force-writes abort and sends ABORT to all
YES-responding subordinates.
- Each subordinate, on receiving COMMIT, force-writes a
commit record, and sends an ACK message.
- Under failure, have site-recovery process running at each active
site that takes over. If discover no commit log record written,
then whether or not it's a subordinate it will undo actions,
write an abort, and forget about it.
- Only tricky case- if recovery processes receives inquiry message
from a prepared subordinate, looks in information on log - if
transaction is aborting or committing, then sends appropriate
response. If no information, then obviously inquirer had
not received a COMMIT/ABORT (inquiree sends out PREPARE, crashes
before receiving votes, and on restart aborts transaction).
Given this, recipient must send back an ABORT.
- Hierarchical 2-phase commit - obvious extension where
non-leaf nodes after receiving a PREPARE propagate it to
subordinates, and only after receiving all votes does it send a
combined (subtee) vote to coordinator.
- Presumed Abort Protocol
- Take advantage of fact that absense of information
implies recovery process orders inquirer to abort.
- Coordinator forgets transaction immediately after
making decision to abort and writing abort record -
abort record need not be forced, and no ACKs need to be
sent by subordinates for ABORT.
- Coordinator also need not record names of subordinates
in abort record or write and end record after
the abort record.
- Partially-read-only transactions: if some process does
not perform any updates, then after receiving PREPARE
and not having done any update, process sends READ
VOTE, releases locks, and forgets about transaction -
no log record written. Coordinator now knows this
subordinate is READ-ONLY, and excludes it from 2nd phase.
- Presumed Commit Protocol
- Most transactions are expected to commit - try to
eliminate ACKs for COMMITs.
- Simple approach: require ABORT to be acknowledged, but
COMMITs need not be, and that abort records are
forced to disk but commit records need not be.
- Problem: If root is in prepare state, a subordinate is
also in prepared state, and root crashes before it has
collected all votes. On recovery, will just forget
about transaction. Fix: coordinator records names of
subordinates safely before any subordinate can get in
the prepared state. So, at start of first phase,
coordinator force-writes a collecting record
with names of all subordinates.
- Same optimization for read-only subordinates can be
done. If all subordinates read-only, don't need 2nd
phase at all.
- Table on page 573 of text giving performance of protocols under
various situations. PA strictly better than 2P; PA better than
PC for completely read-only, and partially-read-only if only
coordinator does update. If only one update subordinate, PA and
PC equal in terms of log writes, but PA needs extra ACK. For n>1
update subordinates, PC and PA need same number of log records,
but PA will force n-1 times while PC will not (because of commit
records by subordinates). PA will also send n extra messages
(due to ACKs).
- Deadlock-detection: see R*
summary for this.
Back to index