Paper Review: Generalized Isolation Level Definitions

Title and Author of Paper Generalized Isolation Level Definitions, Adya et al. Summary The ANSI SQL standard defines isolation levels allowing database users to trade off between performance and consistency when running transactions. Unfortunately, the wording in the SQL standard is geared towards locking as the sole supported concurrency method. This paper presents alternative definitions to the isolation levels specified in the ANSI SQL standard that are general enough to allow for any concurrency method (multi-version, optimistic, etc.) to be used. ...

June 23, 2016 · 3 min · Kevin Sookocheff

So you want to send a message using Apache Thrift?

So you want to use Thrift? You’ve come here because you want to use Apache Thrift and you don’t know where to start. Good. You’re in the right spot. Throughout this document we will develop a simple service that communicates using Thrift. This will introduce you to the workflow for generating client and server code using Thrift and how to Thrift works to separate your application’s business logic from it’s transport methods. ...

June 17, 2016 · 10 min · Kevin Sookocheff

Paper Review: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

Title and Author of Paper DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yu et al. Summary DryadLINQ describes a system for distributing the computation of .NET LINQ expressions on an underlying Dryad cluster. The motivation for this work is to simplify the expression of data parallel algorithms by providing using the higher-level LINQ primitives. This allows the programmer to implement their algorithm as if it was computed on a single machine, and allow the system to worry about the complexities of scheduling, distribution, and fault-tolerance. ...

June 9, 2016 · 2 min · Kevin Sookocheff

Paper Review: MapReduce: Simplified Data Processing on Large Clusters

Title and Author of Paper MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat. Summary MapReduce is designed to solve the problem of processing large sets of data on a fleet of commodity hardware. In such an environment it is assumed that you may have hundreds or thousands of machines and that, at any point in time, these machines may experience failures. The MapReduce framework hides the details of parallelizing your workflow, fault-tolerance, distributing data to workers, and load balancing behind the abstractions map and reduce. The user of MapReduce is responsible for writing these map and reduce functions, while the MapReduce library is responsible for executing that program in a distributed environment. ...

June 8, 2016 · 3 min · Kevin Sookocheff

Paper Review: OLTP Through the Looking Glass, and What We Found There

Title and Author of Paper OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al. Summary Disk I/O has been the primary limiting factor in database performance for most commercial databases. However, as prices of main-memory have dropped it has become feasible to keep the entire working set of a database in RAM. With this architectural change, it makes sense to evaluate database design decisions made to avoid disk I/O to see which ones still hold promise in a main-memory world. This paper provides such a performance analysis. ...

June 6, 2016 · 3 min · Kevin Sookocheff

Getting Started with Amazon Flow Framework

Amazon’s Flow Framework provides a high-level SDK for interacting with the Amazon Simple Workflow service (SWF). SWF is a managed service that helps developers build, run and monitor parallel or sequential asynchronous workloads. SWF reliably commits your workflow’s state to durable storage, allowing you to focus on your business logic rather than on the complex coordination of distributed services. Writing an application with the flow framework can be divided into the following steps: ...

June 2, 2016 · 12 min · Kevin Sookocheff

Paper Review: Hekaton: SQL Server’s Memory-Optimized OLTP Engine

Title and Author of Paper Hekaton: SQL Server’s Memory-Optimized OLTP Engine, Diaconu et al. Summary Database design has traditionally revolved around efficient access to disk. However, recent memory prices make it feasible to keep the majority (or entirety) of a database in main-memory. A main-memory design requires a few adjustments to maximize concurrency, handle transactions, and recover after failure. This paper describes such a design in relation to the development of Hekaton — an extension to Microsoft’s SQL Server. With Hekaton, if the user specifies that a table is “memory-optimized”, this triggers SQL Server to store that table entirely in memory, allowing Hekaton to optimize the table with its in-memory database engine. ...

May 27, 2016 · 3 min · Kevin Sookocheff

Paper Review: C-Store: A column-oriented DBMS

Title and Author of Paper C-Store: A column-oriented DBMS. Stonebraker et al. Summary In traditional databases, all attributes of a record (or tuple) are stored together as a contiguous block. When writing to disk, a single write pushes all fields of the record to disk. For the purposes of this paper, we call this type of DBMS a write-optimized system and this type of system works well for transactional processing. However, for querying data we can do better with a system that is read-optimized. C-Store is such a read-optimized system. ...

May 25, 2016 · 4 min · Kevin Sookocheff

Concurrency: A Primer

Writing correct programs is hard; writing correct concurrent programs is harder. Java Concurrency in Practice. So, why bother with concurrency? A number of reasons: Concurrency provides a natural method for composing asynchronous code. Concurrency allows your program to avoid blocking user operations. Concurrency provides one of the easiest ways take advantage of multi core systems. As processor counts increase, exploiting concurrency will be an even more important facet of high performance systems. Yet, before diving in to writing a concurrent program, it pays to understand the fundamentals of concurrency. To aid in such understanding, this article will provide background material on concurrency and an exploration of different methods for managing state and different models for writing concurrent program. The article is split into three main sections: ...

May 17, 2016 · 18 min · Kevin Sookocheff

Server-to-server OAuth with the Google OAuth Client Library for Java

This post describes how to validate a JWT token using the Google OAuth library for making server-to-server OAuth requests. First, there is a prerequisite of being able to read a key file from your local file system. This key file is obtained from the system that you wish to authorize against and contains the private-key pair authorizing your server with the other system. /** * Return private key from a file. Must be a valid PEM file with PKCS#8 encoding standard. * * @return a private key */ PrivateKey loadPrivateKey(File keyFile) throws IOException, NoSuchAlgorithmException, InvalidKeySpecException { byte[] content = Files.toByteArray(keyFile); PKCS8EncodedKeySpec ks = new PKCS8EncodedKeySpec(content); return KeyFactory.getInstance("RSA").generatePrivate(ks); } Now, assuming we have a valid private key, authenticating with an OAuth end-point using a JWT token is a matter of mapping the JWT token properties with the correct GoogleCredential methods. When GoogleCredential calls the API to obtain a new access token, it converts the methods set on the credential to the correct JWT token properties according to the following table. ...

May 12, 2016 · 2 min · Kevin Sookocheff