Paper Review: Hekaton: SQL Server’s Memory-Optimized OLTP Engine

Title and Author of Paper Hekaton: SQL Server’s Memory-Optimized OLTP Engine, Diaconu et al. Summary Database design has traditionally revolved around efficient access to disk. However, recent memory prices make it feasible to keep the majority (or entirety) of a database in main-memory. A main-memory design requires a few adjustments to maximize concurrency, handle transactions, and recover after failure. This paper describes such a design in relation to the development of Hekaton — an extension to Microsoft’s SQL Server. With Hekaton, if the user specifies that a table is “memory-optimized”, this triggers SQL Server to store that table entirely in memory, allowing Hekaton to optimize the table with its in-memory database engine. ...

May 27, 2016 · 3 min · Kevin Sookocheff

Paper Review: C-Store: A column-oriented DBMS

Title and Author of Paper C-Store: A column-oriented DBMS. Stonebraker et al. Summary In traditional databases, all attributes of a record (or tuple) are stored together as a contiguous block. When writing to disk, a single write pushes all fields of the record to disk. For the purposes of this paper, we call this type of DBMS a write-optimized system and this type of system works well for transactional processing. However, for querying data we can do better with a system that is read-optimized. C-Store is such a read-optimized system. ...

May 25, 2016 · 4 min · Kevin Sookocheff

Concurrency: A Primer

Writing correct programs is hard; writing correct concurrent programs is harder. Java Concurrency in Practice. So, why bother with concurrency? A number of reasons: Concurrency provides a natural method for composing asynchronous code. Concurrency allows your program to avoid blocking user operations. Concurrency provides one of the easiest ways take advantage of multi core systems. As processor counts increase, exploiting concurrency will be an even more important facet of high performance systems. Yet, before diving in to writing a concurrent program, it pays to understand the fundamentals of concurrency. To aid in such understanding, this article will provide background material on concurrency and an exploration of different methods for managing state and different models for writing concurrent program. The article is split into three main sections: ...

May 17, 2016 · 18 min · Kevin Sookocheff

Server-to-server OAuth with the Google OAuth Client Library for Java

This post describes how to validate a JWT token using the Google OAuth library for making server-to-server OAuth requests. First, there is a prerequisite of being able to read a key file from your local file system. This key file is obtained from the system that you wish to authorize against and contains the private-key pair authorizing your server with the other system. /** * Return private key from a file. Must be a valid PEM file with PKCS#8 encoding standard. * * @return a private key */ PrivateKey loadPrivateKey(File keyFile) throws IOException, NoSuchAlgorithmException, InvalidKeySpecException { byte[] content = Files.toByteArray(keyFile); PKCS8EncodedKeySpec ks = new PKCS8EncodedKeySpec(content); return KeyFactory.getInstance("RSA").generatePrivate(ks); } Now, assuming we have a valid private key, authenticating with an OAuth end-point using a JWT token is a matter of mapping the JWT token properties with the correct GoogleCredential methods. When GoogleCredential calls the API to obtain a new access token, it converts the methods set on the credential to the correct JWT token properties according to the following table. ...

May 12, 2016 · 2 min · Kevin Sookocheff

Paper Review: Transaction Management in the R Distributed Database Management System

Title and Author of Paper Transaction Management in the R Distributed Database Management System. C. Mohan et al. Summary This paper describes to handle transactions in a distributed environment using a two-phase commit protocol (2PC). 2PC is a form of atomic commit that uses a coordinator to decide whether or not to commit or abort a transaction. The paper goes on to compare standard 2PC with two variations (1) presumed abort (PA) and (2) presumed commit (PC), which differ in how they handle failure conditions. This paper review will be divided into three sections, one for 2PC, one for PA, and one for PC. ...

May 6, 2016 · 6 min · Kevin Sookocheff

Paper Review: Concurrency Control Performance Modeling: Alternatives and Implications

Title and Author of Paper Concurrency Control Performance Modeling: Alternatives and Implications. R. Agrawal et al. Summary This paper takes an in-depth look at the performance implications of varying concurrency control algorithms. Specifically, it examines the performance of three concurrency methods: blocking, immediate-restart, and optimistic. In the blocking algorithm, all transactions set locks on objects that are read or written; whenever a lock request is denied, the requesting transaction is placed in a waiting queue until it can proceed (on deadlock, the youngest transaction is restarted). With immediate-restart transactions again acquire locks on objects. In this case, however, if the transaction is blocked it is immediately restarted (with some delay). For the optimistic case, all transactions are allowed to proceed as if no conflicts occur; only if a conflict is detected at commit time is a transaction restarted. ...

April 25, 2016 · 3 min · Kevin Sookocheff

Paper Review: Granularity of Locks and Degrees of Consistency in a Shared Data Base

Title and Author of Paper Granularity of Locks and Degrees of Consistency in a Shared Data Base. J. Gray et al. Summary This paper is divided in two sections: granularity of locks, and degrees of consistency. Each section answers questions on how lock choice in a database affects throughput and consistency. Granularity of Locks In the granularity section, the choice of lockable units is discussed. A lockable unit represents a section of logical data that is atomically locked during a transaction. Locking smaller units such as individual records improves concurrency for “simple” transactions that access a small number of records. On the other hand, locking at a record level can reduce throughput for “complex” transactions that require access to many records — the overhead of acquiring and releasing locks overwhelms the computation. It follows that having different sizes of lockable units in the same system is required to handle multiple use cases. ...

April 19, 2016 · 4 min · Kevin Sookocheff

Paper Review: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

Title and Author of Paper ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. C. Mohan et al. Summary ARIES presents and validates the concept of write-ahead logging, providing industrial strength support for atomicity and durability. As described in the Red Book, write-ahead logging is a “near-ubiquitous technique for maintaining durability”. ARIES provides the reference implementation for “No Force, Steal” write-ahead logging used by most databases today. With a “No Force” policy, transactions can be committed without actually flushing dirty pages to disk, while a “Steal” policy implies dirty pages can be flushed to disk at any time. Combined, these two policies allow for high performance as the current state of pages in the database can be kept in memory, avoiding unnecessary I/O operations. ...

March 30, 2016 · 2 min · Kevin Sookocheff

Deploying a static website with rsync

This is one of those things that we all kinda know but that it’s good to write down – how to deploy static files using rsync. For purposes of illustration, this article will describe how to deploy a statically generated website. ...

March 29, 2016 · 2 min · Kevin Sookocheff

Thoughts On Google Cloud Platform Next

I was fortunate enough to attend Google Cloud Platform Next last week and wanted to summarize a few of my thoughts on the conference. As I sat down to analyze the event, I found a few distinct themes that I would like to expand on. Multi-Cloud Google is serious about multi-cloud support for their monitoring and integration products. As a cynic, if Google wants to steal customers from Amazon, offering tools to aid the transition is in their best interest. As a realist, most companies will continue to operate in a multi-cloud environment to take advantage of the strengths of each platform, and it’s nice to see Google recognizing this fact and working within it. Either way, I think being open about supporting AWS is a good thing. ...

March 28, 2016 · 3 min · Kevin Sookocheff