Transaction Management in the R^* Distributed Database Management System

C. Mohan, B. Lindsay, and R. Obermarck Summary by: Steve Gribble and Armando Fox

One-line summary: 2-phase commit is the standard transaction management/distributed commit protocol, but it has 4(N-1) message cost - optimized commit protocols (presumed abort and presumed commit) are presented. (And, for something completely different, distributed deadlock detection is mentioned in a half-page.)

Relevance

2-phase commit is well-known, and potentiall extremely expensive to run. These variations reduce the cost in the common case, which is a good thing to do.

Flaws

Hand-wavvy - I'm fairly convinced these algorithms are correct, but I'd like to see a more formal demonstraction of correctness.
Performance analysis - which is common case in distributed databases? Need to know in order to predict which to use.

Overview/Main Points

Have coordinator and subordinate nodes; subordinates only talk to coordinator (not to each other).
2-Phase commit:
- Coordinator issues PREPARE message to begin first phase.
- Each subordinate that is willing to let transaction commit force-writes a prepare log record then sends a YES VOTE. If not, any subordinate can veto by issuing NO VOTE. A yes-vote locks the subordinate into committing for all time.
- Once coordinator receives all votes, if unilaterally YES, then force-writes a commit record and sends COMMIT message to subordinates, else force-writes abort and sends ABORT to all YES-responding subordinates.
- Each subordinate, on receiving COMMIT, force-writes a commit record, and sends an ACK message.
Under failure, have site-recovery process running at each active site that takes over. If discover no commit log record written, then whether or not it's a subordinate it will undo actions, write an abort, and forget about it.
Only tricky case- if recovery processes receives inquiry message from a prepared subordinate, looks in information on log - if transaction is aborting or committing, then sends appropriate response. If no information, then obviously inquirer had not received a COMMIT/ABORT (inquiree sends out PREPARE, crashes before receiving votes, and on restart aborts transaction). Given this, recipient must send back an ABORT.
Hierarchical 2-phase commit - obvious extension where non-leaf nodes after receiving a PREPARE propagate it to subordinates, and only after receiving all votes does it send a combined (subtee) vote to coordinator.
Presumed Abort Protocol

Take advantage of fact that absense of information implies recovery process orders inquirer to abort.
Coordinator forgets transaction immediately after making decision to abort and writing abort record - abort record need not be forced, and no ACKs need to be sent by subordinates for ABORT.
Coordinator also need not record names of subordinates in abort record or write and end record after the abort record.
Partially-read-only transactions: if some process does not perform any updates, then after receiving PREPARE and not having done any update, process sends READ VOTE, releases locks, and forgets about transaction - no log record written. Coordinator now knows this subordinate is READ-ONLY, and excludes it from 2nd phase.
Presumed Commit Protocol

Most transactions are expected to commit - try to eliminate ACKs for COMMITs.
Simple approach: require ABORT to be acknowledged, but COMMITs need not be, and that abort records are forced to disk but commit records need not be.
Problem: If root is in prepare state, a subordinate is also in prepared state, and root crashes before it has collected all votes. On recovery, will just forget about transaction. Fix: coordinator records names of subordinates safely before any subordinate can get in the prepared state. So, at start of first phase, coordinator force-writes a collecting record with names of all subordinates.
Same optimization for read-only subordinates can be done. If all subordinates read-only, don't need 2nd phase at all.
Table on page 573 of text giving performance of protocols under various situations. PA strictly better than 2P; PA better than PC for completely read-only, and partially-read-only if only coordinator does update. If only one update subordinate, PA and PC equal in terms of log writes, but PA needs extra ACK. For n>1 update subordinates, PC and PA need same number of log records, but PA will force n-1 times while PC will not (because of commit records by subordinates). PA will also send n extra messages (due to ACKs).
Deadlock-detection: see R^* summary for this.

Back to index

Transaction Management in the R* Distributed Database Management System

Relevance

Flaws

Overview/Main Points

Transaction Management in the R^* Distributed Database Management System