Back to index
ARIES: A Transaction Recovery Method Supporting Fine-Granularity
Locking and Partial Rollbacks Using Write-Ahead Logging
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and
Peter Schwarz
Summary by: Steve Gribble and Armando Fox
One-line summary:
Extraordinarily intricate and delicate logic necessary for fully general,
robust, and fine-grained recovery and rollback of transactions.
Overview/Main Points
- This paper is far too detailed to provide a succint summary. The
highest-level points are:
- ARIES records in a log the progress of the
transaction, and any actions (including UNDO/REDO)
which cause changes to recoverable data objects.
- ARIES uses write-ahead logging (WAL) - an updated page
is written back to the same location from where it was
read (in-place updating), and WAL guarantees
that log records representing changes to the data must
already be on stable storage before the in-place
updating of DB data can occur.
- Some terms and concepts which are important:
- forward-processing: updates performed when
system is in normal processing mode (SQL
calls)
- partial rollback: ability to set up
savepoints during transaction, and request
roll back of changes performed by transaction
to savepoint
- total rollback: all updates of transaction
are undone
- normal undo: total or partial rollback when
system is in normal operation
- restart undo: transaction rollback during
restart recovery after system failure
- compensation log records (CLRs) are logs of
updates performed during rollback - in ARIES,
CLRs are viewed as redo-only log records.
- page-oriented redo occurs if log record whose
update is being redone describes the page of
the database that was modified. logical redo
is higher-level; performing a redo may
require accessing several pages.
- ARIES "features":
- simplicity (!!!!!)
- operation logging (and value logging)
- flexible storage management
- partial rollbacks
- flexible buffer management (make least number of
assumptions about buffer management policies in ARIES)
- recovery independence - the recovery of one object
should not force the concurrent recovery of another
object.
- logical undo - the ability, during undo, to affect a
page that is different from the one modified during
forward processing (allows higher level of concurrency)
- parallelism and fast recovery - exploit parallelism
during the different stages of restart recovery and
during media recovery (multiprocessors are your
friends)
- minimal overhead - good performance during normal and
recovery processing
Relevance
The canonical paper on transaction recovery. Shows in gory, gory detail
what is required for industrial strength transaction recovery techniques.
Flaws
- It's hard to grok, let alone convince yourself that it is correct
- Even if the descriptions are correct, I seriously doubt that
anyone besides superman could build an implementation that is
correct. And sadly, the failure mode cannot help but be
disastrous...
- No mention is made of the tradeoffs between correctness in all
situations and performance. Can one made some simplifying
assumptions about what failures will not occur, or will occur
rarely, and use that to optimize performance?
Back to index