A Guided Tour Through Thrift

After reading the Thrift whitepaper and sending your first message, you may still have some questions about how Thrift actually works. This article helps answer those questions by providing a guided tour through the Apache Thrift architecture, highlighting the protocols, transports, and compiler, and how they interact with each other. Thrift from 10,000 feet At a high-level, Thrift is organized into several layers as in Figure 1. The layers highlighted in yellow represent application code that is written by a user. The portions in red represent code generated by the Thrift compiler from an interface definition defined in an IDL file. The layers in orange are portions of Thrift available as library code imported into your application as a dependency. Lastly, the device layer in blue represents the physical device transmitting messages. ...

August 23, 2016 · 4 min · Kevin Sookocheff

Stability Anti-Patterns

I recently finished reading the excellent book “Release It!” by Michael Nygard. One of the key points that I wanted to remember was the stability anti-patterns. So, this post will serve as a reminder of architectural smells to look out for when designing production systems. This list of anti-patterns are common forces that will create or accelerate failures in production systems. Given the nature of distributed systems, avoiding these patterns is not possible. You must accept that they will happen, and program your application to be resilient to these failures. ...

August 15, 2016 · 4 min · Kevin Sookocheff

Metrics-Driven Development

Metrics-Driven Development is an emerging term developing from the practices of continuous integration, continuous delivery, dev ops, and agile software methodologies. This article serves to define what metrics-driven development is, why it is useful, and how to use it to drive software changes. Let’s start with a definition of metrics-driven development. Metrics-Driven Development (MDD) The use of real-time metrics to drive rapid, precise, and granular software iterations. This definition is simple and straightforward, but does leave room for interpretation. Let’s dive deeper and break the definition down, bit-by-bit. ...

August 9, 2016 · 10 min · Kevin Sookocheff

Testing a Producer-Consumer Design using a CyclicBarrier

Testing concurrent objects can be challenging. One particular pattern that is useful for objects used in producer-consumer designs is to ensure that everything put in to a shared concurrent queue by a producer is correctly executed by consumers. ...

July 21, 2016 · 3 min · Kevin Sookocheff

Paper Review: Generalized Isolation Level Definitions

Title and Author of Paper Generalized Isolation Level Definitions, Adya et al. Summary The ANSI SQL standard defines isolation levels allowing database users to trade off between performance and consistency when running transactions. Unfortunately, the wording in the SQL standard is geared towards locking as the sole supported concurrency method. This paper presents alternative definitions to the isolation levels specified in the ANSI SQL standard that are general enough to allow for any concurrency method (multi-version, optimistic, etc.) to be used. ...

June 23, 2016 · 3 min · Kevin Sookocheff

So you want to send a message using Apache Thrift?

So you want to use Thrift? You’ve come here because you want to use Apache Thrift and you don’t know where to start. Good. You’re in the right spot. Throughout this document we will develop a simple service that communicates using Thrift. This will introduce you to the workflow for generating client and server code using Thrift and how to Thrift works to separate your application’s business logic from it’s transport methods. ...

June 17, 2016 · 10 min · Kevin Sookocheff

Paper Review: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

Title and Author of Paper DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yu et al. Summary DryadLINQ describes a system for distributing the computation of .NET LINQ expressions on an underlying Dryad cluster. The motivation for this work is to simplify the expression of data parallel algorithms by providing using the higher-level LINQ primitives. This allows the programmer to implement their algorithm as if it was computed on a single machine, and allow the system to worry about the complexities of scheduling, distribution, and fault-tolerance. ...

June 9, 2016 · 2 min · Kevin Sookocheff

Paper Review: MapReduce: Simplified Data Processing on Large Clusters

Title and Author of Paper MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat. Summary MapReduce is designed to solve the problem of processing large sets of data on a fleet of commodity hardware. In such an environment it is assumed that you may have hundreds or thousands of machines and that, at any point in time, these machines may experience failures. The MapReduce framework hides the details of parallelizing your workflow, fault-tolerance, distributing data to workers, and load balancing behind the abstractions map and reduce. The user of MapReduce is responsible for writing these map and reduce functions, while the MapReduce library is responsible for executing that program in a distributed environment. ...

June 8, 2016 · 3 min · Kevin Sookocheff

Paper Review: OLTP Through the Looking Glass, and What We Found There

Title and Author of Paper OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al. Summary Disk I/O has been the primary limiting factor in database performance for most commercial databases. However, as prices of main-memory have dropped it has become feasible to keep the entire working set of a database in RAM. With this architectural change, it makes sense to evaluate database design decisions made to avoid disk I/O to see which ones still hold promise in a main-memory world. This paper provides such a performance analysis. ...

June 6, 2016 · 3 min · Kevin Sookocheff

Getting Started with Amazon Flow Framework

Amazon’s Flow Framework provides a high-level SDK for interacting with the Amazon Simple Workflow service (SWF). SWF is a managed service that helps developers build, run and monitor parallel or sequential asynchronous workloads. SWF reliably commits your workflow’s state to durable storage, allowing you to focus on your business logic rather than on the complex coordination of distributed services. Writing an application with the flow framework can be divided into the following steps: ...

June 2, 2016 · 12 min · Kevin Sookocheff