Paper Review: The Volcano Optimizer Generator: Extensibility and Efficient Search

Title and Author of Paper The Volcano Optimizer Generator: Extensibility and Efficient Search. Goetz Graefe and William J. McKenna. Summary The query optimizer’s job is to take user input in the form of SQL and generate a cost-efficient plan for satisfying that query using the underlying physical layout of the database. This paper describes Volcano, a system for taking a data model, logical algebra, physical algebra, and optimization rules and translating them into optimizer source code....

October 11, 2016 · 3 min · Kevin Sookocheff

Paper Review: Dynamo: Amazon’s Highly Available Key-value Store

Title and Author of Paper Dynamo: Amazon’s Highly Available Key-value Store, DeCandia et al. Summary Dynamo, as the title of the paper suggests, is Amazon’s highly available key-value storage system. Dynamo only supports primary-key access to data, which is useful for services such as shopping carts and session management. Dynamo’s use case for these services is providing a highly-available system that always accepts writes. This requirement forces the complexity of conflict resolution to data readers....

October 7, 2016 · 2 min · Kevin Sookocheff

Paper Review: Cap Twelve Years Later: How the “Rules” Have Changed

Title and Author of Paper Cap Twelve Years Later: How the “Rules” Have Changed. Eric Brewer. Summary This article provides an exploration of the CAP Theorem and how it relates to database system design. The author argues that, since partitions are likely to happen, the system designer can introduce methods for safely recovering from partitions to compensate. This strategy allows the database to continue to provide availability during a partition, and enforce consistency once the partition is resolved....

October 5, 2016 · 3 min · Kevin Sookocheff

Paper Review: Generalized Isolation Level Definitions

Title and Author of Paper Generalized Isolation Level Definitions, Adya et al. Summary The ANSI SQL standard defines isolation levels allowing database users to trade off between performance and consistency when running transactions. Unfortunately, the wording in the SQL standard is geared towards locking as the sole supported concurrency method. This paper presents alternative definitions to the isolation levels specified in the ANSI SQL standard that are general enough to allow for any concurrency method (multi-version, optimistic, etc....

June 23, 2016 · 3 min · Kevin Sookocheff

Paper Review: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

Title and Author of Paper DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yu et al. Summary DryadLINQ describes a system for distributing the computation of .NET LINQ expressions on an underlying Dryad cluster. The motivation for this work is to simplify the expression of data parallel algorithms by providing using the higher-level LINQ primitives. This allows the programmer to implement their algorithm as if it was computed on a single machine, and allow the system to worry about the complexities of scheduling, distribution, and fault-tolerance....

June 9, 2016 · 2 min · Kevin Sookocheff

Paper Review: MapReduce: Simplified Data Processing on Large Clusters

Title and Author of Paper MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat. Summary MapReduce is designed to solve the problem of processing large sets of data on a fleet of commodity hardware. In such an environment it is assumed that you may have hundreds or thousands of machines and that, at any point in time, these machines may experience failures. The MapReduce framework hides the details of parallelizing your workflow, fault-tolerance, distributing data to workers, and load balancing behind the abstractions map and reduce....

June 8, 2016 · 3 min · Kevin Sookocheff

Paper Review: OLTP Through the Looking Glass, and What We Found There

Title and Author of Paper OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al. Summary Disk I/O has been the primary limiting factor in database performance for most commercial databases. However, as prices of main-memory have dropped it has become feasible to keep the entire working set of a database in RAM. With this architectural change, it makes sense to evaluate database design decisions made to avoid disk I/O to see which ones still hold promise in a main-memory world....

June 6, 2016 · 3 min · Kevin Sookocheff

Paper Review: Hekaton: SQL Server’s Memory-Optimized OLTP Engine

Title and Author of Paper Hekaton: SQL Server’s Memory-Optimized OLTP Engine, Diaconu et al. Summary Database design has traditionally revolved around efficient access to disk. However, recent memory prices make it feasible to keep the majority (or entirety) of a database in main-memory. A main-memory design requires a few adjustments to maximize concurrency, handle transactions, and recover after failure. This paper describes such a design in relation to the development of Hekaton — an extension to Microsoft’s SQL Server....

May 27, 2016 · 3 min · Kevin Sookocheff

Paper Review: C-Store: A column-oriented DBMS

Title and Author of Paper C-Store: A column-oriented DBMS. Stonebraker et al. Summary In traditional databases, all attributes of a record (or tuple) are stored together as a contiguous block. When writing to disk, a single write pushes all fields of the record to disk. For the purposes of this paper, we call this type of DBMS a write-optimized system and this type of system works well for transactional processing. However, for querying data we can do better with a system that is read-optimized....

May 25, 2016 · 4 min · Kevin Sookocheff

Paper Review: Transaction Management in the R Distributed Database Management System

Title and Author of Paper Transaction Management in the R Distributed Database Management System. C. Mohan et al. Summary This paper describes to handle transactions in a distributed environment using a two-phase commit protocol (2PC). 2PC is a form of atomic commit that uses a coordinator to decide whether or not to commit or abort a transaction. The paper goes on to compare standard 2PC with two variations (1) presumed abort (PA) and (2) presumed commit (PC), which differ in how they handle failure conditions....

May 6, 2016 · 6 min · Kevin Sookocheff