Title and Author of Paper
The Volcano Optimizer Generator: Extensibility and Efficient Search. Goetz Graefe and William J. McKenna.
Summary
The query optimizer’s job is to take user input in the form of SQL and generate a cost-efficient plan for satisfying that query using the underlying physical layout of the database. This paper describes Volcano, a system for taking a data model, logical algebra, physical algebra, and optimization rules and translating them into optimizer source code.
At a high level, Volcano provides a framework for database implementers to input a model specifying the properties of their physical and logical model. For each user query, the database implementer passes that query to the Volcano optimizer generator, and Volcano outputs an optimized plan for that query.
Volcano is built using five design principals. First, Volcano allows for an extensible algebraic and logical set of operators that describe the operation of the underlying database in terms of equivalence relationships, set operations on the data, and suitable implementation algorithms. Volcano’s job is to map an expression in the logical algebra into an expression in the physical algebra, where the logical algebra describes a user’s query and the physical algebra describes a query plan of algorithms that satisfy the query.
Second, Volcano uses the concept of rules to allow database implementer’s to describe general algebraic relationships and patterns that can be applied to a query.
Third, any choices that the generated optimizer is allowed to make must map a query into an optimal equivalent evaluation plan represented as logical and physical algebraic equivalences. This is in contrast to some optimizers that generate a sequence of intermediate steps between a user’s query and an optimized query plan.
Fourth, Volcano made the choice to generate compiled code, rather than code that may be interpreted.
Fifth, and finally, the search engine uses dynamic programming by retaining a large set of partially evaluated results and using these results in later decisions. In effect, Volcano uses a backward chaining to search through a decision tree for the most optimal query plan according to the logical and physical algebra provided by the database implementer.
To use Volcano in a database, the implementer provides a set of logical operators, algebraic transformation rules, a set of algorithms, implementation rules for those algorithms, a “cost” function for arithmetic and comparison functions, “logical properties” describing the database schema (and more), “physical properties” describing algorithmic dependencies such as sort ordering or partitioning, an “apply” function for each algorithm, and a cost function for each algorithm. Volcano takes this data and creates an optimized query plan for user input. Although this seems like a lot of code for the database implementer to provide, it is argued in the paper that this still is a total work savings for the database implementer when considering the alternative of providing all of these rules and implementing the search algorithm that Volcano provides.