Paper Review: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

Title and Author of Paper DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yu et al. Summary DryadLINQ describes a system for distributing the computation of .NET LINQ expressions on an underlying Dryad cluster. The motivation for this work is to simplify the expression of data parallel algorithms by providing using the higher-level LINQ primitives. This allows the programmer to implement their algorithm as if it was computed on a single machine, and allow the system to worry about the complexities of scheduling, distribution, and fault-tolerance. ...

June 9, 2016 · 2 min · Kevin Sookocheff

Paper Review: OLTP Through the Looking Glass, and What We Found There

Title and Author of Paper OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al. Summary Disk I/O has been the primary limiting factor in database performance for most commercial databases. However, as prices of main-memory have dropped it has become feasible to keep the entire working set of a database in RAM. With this architectural change, it makes sense to evaluate database design decisions made to avoid disk I/O to see which ones still hold promise in a main-memory world. This paper provides such a performance analysis. ...

June 6, 2016 · 3 min · Kevin Sookocheff

Paper Review: Hekaton: SQL Server’s Memory-Optimized OLTP Engine

Title and Author of Paper Hekaton: SQL Server’s Memory-Optimized OLTP Engine, Diaconu et al. Summary Database design has traditionally revolved around efficient access to disk. However, recent memory prices make it feasible to keep the majority (or entirety) of a database in main-memory. A main-memory design requires a few adjustments to maximize concurrency, handle transactions, and recover after failure. This paper describes such a design in relation to the development of Hekaton — an extension to Microsoft’s SQL Server. With Hekaton, if the user specifies that a table is “memory-optimized”, this triggers SQL Server to store that table entirely in memory, allowing Hekaton to optimize the table with its in-memory database engine. ...

May 27, 2016 · 3 min · Kevin Sookocheff

Paper Review: C-Store: A column-oriented DBMS

Title and Author of Paper C-Store: A column-oriented DBMS. Stonebraker et al. Summary In traditional databases, all attributes of a record (or tuple) are stored together as a contiguous block. When writing to disk, a single write pushes all fields of the record to disk. For the purposes of this paper, we call this type of DBMS a write-optimized system and this type of system works well for transactional processing. However, for querying data we can do better with a system that is read-optimized. C-Store is such a read-optimized system. ...

May 25, 2016 · 4 min · Kevin Sookocheff

Paper Review: Transaction Management in the R Distributed Database Management System

Title and Author of Paper Transaction Management in the R Distributed Database Management System. C. Mohan et al. Summary This paper describes to handle transactions in a distributed environment using a two-phase commit protocol (2PC). 2PC is a form of atomic commit that uses a coordinator to decide whether or not to commit or abort a transaction. The paper goes on to compare standard 2PC with two variations (1) presumed abort (PA) and (2) presumed commit (PC), which differ in how they handle failure conditions. This paper review will be divided into three sections, one for 2PC, one for PA, and one for PC. ...

May 6, 2016 · 6 min · Kevin Sookocheff

Paper Review: Concurrency Control Performance Modeling: Alternatives and Implications

Title and Author of Paper Concurrency Control Performance Modeling: Alternatives and Implications. R. Agrawal et al. Summary This paper takes an in-depth look at the performance implications of varying concurrency control algorithms. Specifically, it examines the performance of three concurrency methods: blocking, immediate-restart, and optimistic. In the blocking algorithm, all transactions set locks on objects that are read or written; whenever a lock request is denied, the requesting transaction is placed in a waiting queue until it can proceed (on deadlock, the youngest transaction is restarted). With immediate-restart transactions again acquire locks on objects. In this case, however, if the transaction is blocked it is immediately restarted (with some delay). For the optimistic case, all transactions are allowed to proceed as if no conflicts occur; only if a conflict is detected at commit time is a transaction restarted. ...

April 25, 2016 · 3 min · Kevin Sookocheff

Paper Review: Granularity of Locks and Degrees of Consistency in a Shared Data Base

Title and Author of Paper Granularity of Locks and Degrees of Consistency in a Shared Data Base. J. Gray et al. Summary This paper is divided in two sections: granularity of locks, and degrees of consistency. Each section answers questions on how lock choice in a database affects throughput and consistency. Granularity of Locks In the granularity section, the choice of lockable units is discussed. A lockable unit represents a section of logical data that is atomically locked during a transaction. Locking smaller units such as individual records improves concurrency for “simple” transactions that access a small number of records. On the other hand, locking at a record level can reduce throughput for “complex” transactions that require access to many records — the overhead of acquiring and releasing locks overwhelms the computation. It follows that having different sizes of lockable units in the same system is required to handle multiple use cases. ...

April 19, 2016 · 4 min · Kevin Sookocheff

Paper Review: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

Title and Author of Paper ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. C. Mohan et al. Summary ARIES presents and validates the concept of write-ahead logging, providing industrial strength support for atomicity and durability. As described in the Red Book, write-ahead logging is a “near-ubiquitous technique for maintaining durability”. ARIES provides the reference implementation for “No Force, Steal” write-ahead logging used by most databases today. With a “No Force” policy, transactions can be committed without actually flushing dirty pages to disk, while a “Steal” policy implies dirty pages can be flushed to disk at any time. Combined, these two policies allow for high performance as the current state of pages in the database can be kept in memory, avoiding unnecessary I/O operations. ...

March 30, 2016 · 2 min · Kevin Sookocheff

Paper Review: Access Path Selection in a Relational Database Management System

Title and Author of Paper Access Path Selection in a Relational Database Management System. P. G. Selinger et al. Summary This paper describes methods of the SQL query optimizer for determining the cost of satisfying a query. It also describes methods for choosing among several competing methods. What are the motivations for this work? SQL is a high-level language where requests for data are stated non-procedurally. The user is not expected to need any knowledge of how the data is stored in the database or how it is retrieved. Thus, it is up to the DBMS to choose an appropriate access path for data retrieval on the users behalf. By designing the database in this fashion, we preserve data independence, where a users view of the data is independent of the databases view of the data. ...

March 15, 2016 · 3 min · Kevin Sookocheff

Paper Review: Eddies: Continuously Adaptive Query Processing

Title and Author of Paper Eddies: Continuously Adaptive Query Processing. Ron Avnur and Joseph M. Hellerstein. Summary Eddies describes a query optimization system that continuously reorders operators in a query plan as the it runs. This insight is based on the observation that assumptions made about the database at the time that a query is submitted will rarely hold throughout the duration of query processing. Query plans can be reordered using two criteria: synchronization barriers and moments of symmetry. Synchronization barriers exist whenever an operator is waiting for a table scan to complete before making forward progress. In general, these barriers limit concurrency and one goal of the eddies system is to avoid or improve these barriers by selecting an appropriate join algorithm. ...

March 15, 2016 · 2 min · Kevin Sookocheff