App Engine Pipelines API - Part 4: Pipeline Internals

View all articles in the Pipeline API Series. We’ve learned how to execute and chain together pipelines, now let’s take a look at how pipelines execute under the hood. If necessary, you can refer to the source code of the pipelines project to clarify any details. The Pipeline Data Model Let’s start with the pipeline data model. Note that each Kind defined by the pipelines API is prefixed by _AE_Pipeline, making it easy to view individual pipeline details by viewing the datastore entity. ...

May 27, 2015 · 4 min · Kevin Sookocheff

App Engine Pipelines API - Part 3: Fan In, Fan Out, Sequencing

View all articles in the Pipeline API Series. Last time, we studied how to connect two pipelines together. In this post, we expand on this topic, exploring how to fan-out to do multiple tasks in parallel, fan-in to combine multiple tasks into one, and how to do sequential work. ...

May 19, 2015 · 3 min · Kevin Sookocheff

App Engine Pipelines API - Part 2: Connecting Pipelines

View all articles in the Pipeline API Series. Last time, we discussed basic pipeline instantiation and execution. This time, we will cover sequential pipelines, answering the question “How do I connect the output of one pipeline with the input of another pipeline”? ...

May 12, 2015 · 2 min · Kevin Sookocheff

App Engine Pipelines API - Part 1: The Basics

View all articles in the Pipeline API Series. The Pipelines API is a general purpose workflow engine for App Engine applications. With the Pipelines API we can connect together complex workflows into a coherent run time backed by the Datastore. This article provides a basic overview of the Pipelines API and how it can be used for abritrary computational workflows. In the most basic sense a Pipeline is an object that takes input, performs some logic or computation on that input, and produces output. Pipelines can take two general forms – synchronous or asynchronous. Synchronous pipelines act as basic functions that must complete during a single request. Asynchronous pipelines spawn child pipelines and connect them together into a workflow by passing input and output parameters around. ...

May 5, 2015 · 6 min · Kevin Sookocheff

Durabledict for App Engine

tldr; DatastoreDict. What’s a durabledict? Good question. Durabledict is a Python implementation of a persistent dictionary. The dictionary values are cached locally and sync with the datastore whenever a value in the datastore changes. Disqus provides concrete implementations for Redis, Django, ZooKeeper and in memory. This blog post details an implementation using the App Engine datastore and memcache. Creating your own durabledict By following the guide the durabledict README we can create our own implementation. We need to subclass durabledict.base.DurableDict and implement the following interface methods. Strictly speaking, _pop and _setdefault do not have to be implemented but doing so makes your durabledict behave like a base dict in all cases. ...

April 29, 2015 · 5 min · Kevin Sookocheff

Continuous Delivery Distilled

What if you could deliver more value, with more speed and with more stability? What if you could triage bugs faster? What if you could fix bugs easier and with less user facing impact? You can, with continuous delivery. Terminology First, some terminology. What distinguishes continuous integration, continuous deployment and continuous delivery? Continuous integration revolves around the continuous automated testing of software whenever change to the software is made. Continuous deployment is the practice of automatically deploying any change to the code. Continuous delivery implies that you can deploy any change to production but for any number of reasons you may choose not to. The focus of this article is on continuous delivery. ...

April 23, 2015 · 13 min · Kevin Sookocheff

Creating a BigQuery Table using the Java Client Library

I haven’t been able to find great documentation on creating a BigQuery TableSchema using the Java Client Library. This blog post hopes to rectify that :). You can use the BigQuery sample code for an idea of how to create a client connection to BigQuery. Assuming you have the connection set up you can start by creating a new TableSchema. The TableSchema provides a method for setting the list of fields that make up the columns of your BigQuery Table. Those columns are defined as an Array of TableFieldSchema objects. ...

March 23, 2015 · 2 min · Kevin Sookocheff

Deploying R Studio on Compute Engine

Sometimes you have a data analysis problem that is just too big for your desktop or laptop. The limiting factor here is generally RAM. Thankfully, services like Google Compute Engine allow you to lease servers with up to 208GB of RAM, large enough for a wide variety of intensive tasks. An ancillary benefit of using a service like Compute Engine is that it allows you to easily load your data from a Cloud Storage Bucket, meaning you don’t need to keep a copy of the large dataset locally at all times. ...

March 23, 2015 · 3 min · Kevin Sookocheff

Keeping App Engine Search Documents and Datastore Entities In Sync

At Vendasta the App Engine Datastore serves as the single point of truth for most operational data and the majority of interactions are against this single point of truth. However, a piece of required functionality in many of our products is to provide a searchable view of the data in the App Engine Datastore. Search is difficult using the Datastore and so we have moved to using the Search API as a managed solution for searching datastore entities. In this use case, every edit to an entity in the Datastore is reflected as a change to a Search Document. This article details an architecture for keeping Datastore entities and Search Documents in sync throughout failure and race conditions. ...

February 23, 2015 · 6 min · Kevin Sookocheff

Halting Python unittest Execution on First Error

We all know the importance of unit tests. Especially in a dynamic language like Python. Occasionally you have a set of unit tests that are failing in a cascading fashion where the first error case causes subsequent tests to fail (these tests are likely no longer unit tests, but that’s a different discussion). To help isolate the offending test case in a see of failures you can set the unittest.TestCase class to halt after the first error by overriding the run method as follows. ...

February 12, 2015 · 1 min · Kevin Sookocheff