**One-line summary:** In executing a rooted join tree, sub-joins
should be done in *rank order* (to the extent that the reordering
respects the partial ordering inherent in the tree). Authors show how
to compute join cost based on rank for such PPT's (pipelined processing
query trees, i.e n-ary join bushy trees) and LPT (linear processing
trees, i.e. left or right deep trees).

- Observation: it's more important for an optimizer to avoid the worst execution than to find the best one, since for many queries, pathological inputs or datasets could cause the worst case behavior to be orders of magnitude worse than the average case.
- Authors work with
*join trees*: queries where the join graph is acyclic. For a given rooted join tree, each relation has a unique pareant and a well-defined join selectivity (see Selinger) with respect to the parent. (The selectivity of the root is defined to be 1.) - For nested-loop and selective-access join, the cost function of
join(R1,R2) is
C=n1*g(n2), where n1,n2 are the number of tuples in R1 and R2 and
g(n2) is the (join-method-specific)
*differential cost incurred per tuple of*n1 (Sort-merge, which has cost function C=n1+n2+n1 log n1+n2 log n2, doesn't fit this model; we return to it later.) - Processing (execution) tree types:
- LPT: linear processing tree; at most 1 temporary relation is used as input to any join. (I.e. a left- or right-deep tree, not a bushy one.)
- Binary LPT (BLPT): all joins are 2-way.
- PPT: pipelined processing tree; multiway joins are pipeline-executed left to right (nested loop n-ary join).

- Cost equations:
- The cardinality n
_{1,2}of joining R_{1}with R_{2}m is Product(i=1...j) (s_{i}*n_{i}) - The cost of a pipelined processing tree execution (PPT) is
the sum of the costs of each pairwise join, that is,
Sum(j=2...k)
(n
_{j-1,j}*g_{j}(n_{j}). - The cost of a BLPT is the same as the above but
incorporates additional cost of storing intermediate join
results, i.e. replace the term g
_{j}(n_{j}with (g_{j}(n_{j}) + Cn_{j}s_{j}). - Authors claim that the above equations (in 2. and 3.) have
"the same structure". I disagree, since the additional
dependence on s
_{j}in equation 3 is nonlinear.

- The cardinality n
- Strategy for tree queries: use the ASI (adjacent sequence interchange) property: for sequences A,B,U,V, the execution AUVB costs less than AVUB iff the rank(U)<rank(V). I.e. the order in which to do U and V can be decided irrespective of A and B.
- So: optimize a PPT by optimizing each subquery such that the optimization respects the tree's partial ordering. I.e. child joins must be done before their parents, but the order in which children are done may not matter; so order them by rank. In other words, convert each subtree to a chain using the rank function.
- Optimal sequence based on the "rank" cost function can be produced in O(N log N) time if N joins are to be sequenced.

- Assumption throughout is that database is entirely RAM-resident -- no disk costs. Assumption is later "relaxed" by claiming that disk I/O cost is subsumed into the g(n) component of the "generic cost equation". Given how much work has been done on minimizing I/O's and optimizing their execution, I find this bogus.
- The authors claim to present a
*formalism*for optimizing nonrecursive queries. What they actually present is a bunch of equations (which I still don't believe really have the same structure), not at all the same thing. The framework they describe doesn't seem general enough to be called a formalism, and does not provide any new powers of reasoning about optimizing query trees, just cost models. - The authors need to learn to organize ideas, to write good prose, and to typeset; or they need to delegate these duties to a capable human or program.