Paper Review: Robust Query Processing through Progressive Optimization

Title and Author of Paper

Robust Query Processing through Progressive Optimization. Markl et al.

Summary

Traditional query optimizers choose an execution plan for a query by using estimates of current database statistics. However, these estimates may be inaccurate, leading to overly expensive query plans being chosen and executed. This paper presents progressive query optimization, allowing query execution to detect and recover from estimation errors during processing.

During each execution step, progressive query optimization (POP) detects differences between the cardinality of the currently processed tuple and compares that to the estimated cardinality that was used to define the original execution plan. If those cardinalities differ enough, POP will re-optimize the query using updated estimates of cardinality. Any materialized views already computed can be reused during the re-execution step.

POP works by inserting checkpoints into a query execution plan. Each checkpoint acts as an assertion that plan estimates agree with actual values seen during execution. The checkpoints are inserted between “producers” and “consumers”, which are stages of the currently executing query. The checkpoint monitors the number of rows flowing between the producers and consumers and can suspend query execution if the number of rows it sees are widely different than the number of rows estimated, signalling an opportunity for re-optimization.

The checkpoints are implemented as CHECK relational operators within the currently executing relational query. The CHECK operators have no relation semantics. Each CHECK has a cardinality range defining the lower and upper bounds of the cardinality estimate. If the observed cardinality during execution is within this range, the CHECK succeeds and query processing continues normally. If the observed cardinality is outside of this range, the CHECK fails and the query is terminated and re-optimized for re-execution. Any intermediate results are buffered and made available for the subsequent query execution.

POP can be inserted into an existing query planner by adding the logic for the CHECK operator to the architecture. This implies adding logic that captures cardinality estimates for the CHECK operator, inserting CHECK operators at appropriate points within an execution plan, and adding the compiler and runtime facilities to generate and execute CHECK operators. To re-use intermediate results requires adding logic to buffer and exploit results if a CHECK operator fails.

Re-optimizing and re-executing a query can result in a significant performance hit — you only want to re-optimize if you are sure the plan will improve. With POP, the determination of the cardinality range for a CHECK operator largely determines the effectiveness of query optimization. These ranges are computed by analyzing the different query plans that are considered while the optimizer prunes and enumerates possible execution strategies. POP uses numerical methods to approximate the validity range given the estimated ranges of several alternative plans.

To exploit intermediate results of an execution, whenever a CHECK operator fails, any results are stored in a temporary materialized view. During re-optimization, these materialized views (with their cardinalities) will be available to the query optimizer during query planning.

The paper shows that POP can drastically reduce the execution time of worst-case queries. However, in their findings, 22 queries received an improvement via POP, 17 queries actually experience a degradation. The author’s suggest that their CHECK operator may lead to over-eager re-optimizations, leading to queries that are re-executed with little benefit.

Like this post? Subscribe via RSS or email to never miss an update.