Tests are Never Enough

The request was fairly simple — iterate through some data and update some dates. Something as simple as this. for entry in db.query(): entry.date = entry.date + timedelta(days=10) entry.put() Unfortunately, the NoSQL database we were using did not support schemas and, unbeknownst to me, some data had been changed in unexpected ways on production. Something as simple as this. Notice the Outlier? My code seemed correct. Unit tests passed. Code review passed. It ran flawlessly on the test environment. QA signed off on the change. So we deployed it. And things broke. ...

September 23, 2016 · 2 min · Kevin Sookocheff

The Problem with Point to Point Communication

Point-to-point communication between servers usually works just fine when one or two instances are communicating. One or two instances communicating However, if you increase the number of applications, the total number of connections increases. Three instances communicating In fact, the total number of connections increases as a square of the number of application instances. So, if you are running 100 instances, you will be maintaining O(100^2) connections. The scaling becomes quite painful. ...

August 31, 2016 · 1 min · Kevin Sookocheff

A Guided Tour Through Thrift

After reading the Thrift whitepaper and sending your first message, you may still have some questions about how Thrift actually works. This article helps answer those questions by providing a guided tour through the Apache Thrift architecture, highlighting the protocols, transports, and compiler, and how they interact with each other. Thrift from 10,000 feet At a high-level, Thrift is organized into several layers as in Figure 1. The layers highlighted in yellow represent application code that is written by a user. The portions in red represent code generated by the Thrift compiler from an interface definition defined in an IDL file. The layers in orange are portions of Thrift available as library code imported into your application as a dependency. Lastly, the device layer in blue represents the physical device transmitting messages. ...

August 23, 2016 · 4 min · Kevin Sookocheff

Stability Anti-Patterns

I recently finished reading the excellent book “Release It!” by Michael Nygard. One of the key points that I wanted to remember was the stability anti-patterns. So, this post will serve as a reminder of architectural smells to look out for when designing production systems. This list of anti-patterns are common forces that will create or accelerate failures in production systems. Given the nature of distributed systems, avoiding these patterns is not possible. You must accept that they will happen, and program your application to be resilient to these failures. ...

August 15, 2016 · 4 min · Kevin Sookocheff

Metrics-Driven Development

Metrics-Driven Development is an emerging term developing from the practices of continuous integration, continuous delivery, dev ops, and agile software methodologies. This article serves to define what metrics-driven development is, why it is useful, and how to use it to drive software changes. Let’s start with a definition of metrics-driven development. Metrics-Driven Development (MDD) The use of real-time metrics to drive rapid, precise, and granular software iterations. This definition is simple and straightforward, but does leave room for interpretation. Let’s dive deeper and break the definition down, bit-by-bit. ...

August 9, 2016 · 10 min · Kevin Sookocheff

Testing a Producer-Consumer Design using a CyclicBarrier

Testing concurrent objects can be challenging. One particular pattern that is useful for objects used in producer-consumer designs is to ensure that everything put in to a shared concurrent queue by a producer is correctly executed by consumers. ...

July 21, 2016 · 3 min · Kevin Sookocheff

Paper Review: Generalized Isolation Level Definitions

Title and Author of Paper Generalized Isolation Level Definitions, Adya et al. Summary The ANSI SQL standard defines isolation levels allowing database users to trade off between performance and consistency when running transactions. Unfortunately, the wording in the SQL standard is geared towards locking as the sole supported concurrency method. This paper presents alternative definitions to the isolation levels specified in the ANSI SQL standard that are general enough to allow for any concurrency method (multi-version, optimistic, etc.) to be used. ...

June 23, 2016 · 3 min · Kevin Sookocheff

So you want to send a message using Apache Thrift?

So you want to use Thrift? You’ve come here because you want to use Apache Thrift and you don’t know where to start. Good. You’re in the right spot. Throughout this document we will develop a simple service that communicates using Thrift. This will introduce you to the workflow for generating client and server code using Thrift and how to Thrift works to separate your application’s business logic from it’s transport methods. ...

June 17, 2016 · 10 min · Kevin Sookocheff

Paper Review: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language

Title and Author of Paper DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yu et al. Summary DryadLINQ describes a system for distributing the computation of .NET LINQ expressions on an underlying Dryad cluster. The motivation for this work is to simplify the expression of data parallel algorithms by providing using the higher-level LINQ primitives. This allows the programmer to implement their algorithm as if it was computed on a single machine, and allow the system to worry about the complexities of scheduling, distribution, and fault-tolerance. ...

June 9, 2016 · 2 min · Kevin Sookocheff

Paper Review: MapReduce: Simplified Data Processing on Large Clusters

Title and Author of Paper MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat. Summary MapReduce is designed to solve the problem of processing large sets of data on a fleet of commodity hardware. In such an environment it is assumed that you may have hundreds or thousands of machines and that, at any point in time, these machines may experience failures. The MapReduce framework hides the details of parallelizing your workflow, fault-tolerance, distributing data to workers, and load balancing behind the abstractions map and reduce. The user of MapReduce is responsible for writing these map and reduce functions, while the MapReduce library is responsible for executing that program in a distributed environment. ...

June 8, 2016 · 3 min · Kevin Sookocheff