Paper Review: The Design of POSTGRES

Title and Author of Paper The Design of POSTGRES, Michael Stonebraker and Lawrence A. Rowe. Summary Postgres started as a research project to extend the standard database architecture to support several additional concepts: complex objects as values, user-defined data types and procedures, and alerting and triggers. This paper describes the system architecture designed to achieve these goals, while retaining functionality of the relational model. Although the design incorporates additional ideas such as time varying data, I will focus my review on the user-defined types and alerting scenarios. ...

March 4, 2016 · 3 min · Kevin Sookocheff

Paper Review: System R: Relational Approach to Database Management

Title and Author of Paper System R: Relational Approach to Database Management. M. M. Astrahan et al. Summary It’s hard to overstate the influence that the System R project had on database design and implementation. After reading this paper it is clear that traditional database architecture has not significantly changed since the System R project. System R provided the first implementation of SQL, the first demonstration of performant transactions, and provided the foundational groundwork in concurrency control and query optimization. ...

March 2, 2016 · 4 min · Kevin Sookocheff

The Five Stages of NoSQL

Imagine a fledgling software startup consisting of one or two developers. They are following the lean startup methodology by throwing ideas and implementations at a wall to see what sticks. This methodology demands keeping your application as simple as possible until you find the optimum market. An on-going concern for the developers is finding a simple, flexible way to store application data: NoSQL or SQL? The NoSQL database offers a premium out-of-the-box experience: install a package, start the database, and post and retrieve data using a JSON API. And by eschewing the details of a schema and the inconvenience of data modelling, the NoSQL database allows for fast iteration. ...

February 28, 2016 · 4 min · Kevin Sookocheff

Paper Review: Architecture of a Database System

Title and Author of Paper Architecture of a Database System. Joseph M. Hellerstein, Michael Stonebraker, James Hamilton. Summary Architecture of a Database System provides an explanation of how to implement a relational database. It begins with an architectural overview of the main parts of a database system as viewed through the life of an SQL query. This includes how the query is received, parsed and optimized and how the resulting data is returned from storage as part of a transaction. ...

February 24, 2016 · 3 min · Kevin Sookocheff

Generating Java with JCodeModel

Have you come across the misfortune of needing to auto-generate Java source code? Luckily, anything you’ve wanted to do with Java has already been done — and auto-generating Java is no different. I recently used JCodeModel to translate from JSON to Java — it worked great but it lacks any tutorial-style documentation. This article means to fill that gap. If you feel the need to go more in-depth, consult the the Javadoc. ...

February 21, 2016 · 4 min · Kevin Sookocheff

Writing an Apache Beam Batch Sink

This article describes how you can use the Dataflow/Beam SDK to write files to an S3 bucket by implementing a Sink. A Sink has three phases: initialization, writing, and finalization. The initialization phase is a sequential process where you can create necessary preconditions such as output directories. The write phase lets workers write bundles of records to the Sink. The finalization phase allows for cleanup like merging files or committing writes. ...

February 11, 2016 · 10 min · Kevin Sookocheff

Deploying a Druid Cluster with Ansible

During my continued education on Ansible I’ve writing some roles for deploying a Druid cluster to AWS similar to the article on deploying Zookeeper with Ansible. The methods are fairly simple so rather than going through a detailed explanation I will just leave a link to the full source Github. Any and all contributions are welcome!

February 2, 2016 · 1 min · Kevin Sookocheff

Paper Review: What Goes Around Comes Around

Title and Author of Paper What Goes Around Comes Around. Joseph M. Hellerstein and Michael Stonebraker. Summary What Goes Around Comes Around summarizes several methods for modelling data within a database system. Each data model is described and the benefits and drawbacks listed as lessons learned from research into that model. The authors clearly present their opinions on each model and help readers unfamiliar with past modelling attempts understand the history of this area of research. ...

January 27, 2016 · 3 min · Kevin Sookocheff

Deploying Zookeeper with Exhibitor to AWS using Ansible

This article provides a detailed guide of deploying Zookeeper to AWS using Exhibitor for cluster management. Exhibitor is a great help for managing your cluster but getting things up and running is not well documented. Hopefully this article corrects that deficiency. ...

January 15, 2016 · 4 min · Kevin Sookocheff

Getting to Know Cloud Dataflow

Cloud Dataflow is Google’s managed service for batch and stream data processing. Dataflow provides a programming model and execution framework that allows you to run the same code in batch or streaming mode, with guarantees on correctness and primitives for correcting timing issues. Why should you care about Dataflow? A few reasons. First, Dataflow is the only stream processing framework that has strong consistency guarantees for time series data. Second, Dataflow integrates well with the Google Cloud Platform and provides seamless methods for reading from and writing to the Datastore, PubSub, BigQuery and Cloud Storage. Third, the Dataflow SDK is open source and has received contributions for interfacing with Hadoop, Firebase, and Salesforce — AWS integration is absolutely possible. Lastly, Dataflow is completely managed, whereas competing offerings such as Spark and Flink typically run on top of a Hadoop installation used for intermediate storage. ...

January 4, 2016 · 8 min · Kevin Sookocheff