Keeping App Engine Search Documents and Datastore Entities In Sync

At Vendasta the App Engine Datastore serves as the single point of truth for most operational data and the majority of interactions are against this single point of truth. However, a piece of required functionality in many of our products is to provide a searchable view of the data in the App Engine Datastore. Search is difficult using the Datastore and so we have moved to using the Search API as a managed solution for searching datastore entities. In this use case, every edit to an entity in the Datastore is reflected as a change to a Search Document. This article details an architecture for keeping Datastore entities and Search Documents in sync throughout failure and race conditions. ...

February 23, 2015 · 6 min · Kevin Sookocheff

Halting Python unittest Execution on First Error

We all know the importance of unit tests. Especially in a dynamic language like Python. Occasionally you have a set of unit tests that are failing in a cascading fashion where the first error case causes subsequent tests to fail (these tests are likely no longer unit tests, but that’s a different discussion). To help isolate the offending test case in a see of failures you can set the unittest.TestCase class to halt after the first error by overriding the run method as follows. ...

February 12, 2015 · 1 min · Kevin Sookocheff

Create a Google Cloud Dataflow Project with Gradle

I’ve been experimenting with the Google Cloud Dataflow Java SDK for running managed data processing pipelines. One of the first tasks is getting a build environment up and running. For this I chose Gradle. We start by declaring this a java application and listing the configuration variables that declare the source compatibility level (which for now must be 1.7) and the main class to be executed by the run task to be defined later. ...

February 11, 2015 · 2 min · Kevin Sookocheff

A pypiserver Deployment Script

At Vendasta we’ve been slowly adopting pypi and pip for our internal code libraries and the time has come to deploy our own private pypi server. After evaluating a few options I settled on the simplistic pypiserver – a barebones implementation of the simple HTTP API to pypi. The deployment uses nginx as a front-end to pypiserver. pypiserver itself is ran and monitored using supervisord. I created a bash script that creates a user and group to run pypiserver and installs and runs nginx, supervisord and pypiserver. I’ve been running this bash script through Vagrant to deploy a custom pypiserver for private use. I wanted to save this code for posterity and hopefully help someone else working on the same task. ...

February 1, 2015 · 2 min · Kevin Sookocheff

Downloading files from Google Cloud Storage with webapp2

I’ve been working on a simple App Engine application that offers upload and download functionality to and from Google Cloud Storage. When it came time to actually download the content I needed to write a webapp2 RequestHandler that will retrieve the file from Cloud Storage and return it to the client. ...

January 27, 2015 · 1 min · Kevin Sookocheff

Querying App Engine Logs with Elasticsearch

From a DevOps perspective having a historical record of application logs can aid immensely in tracking down bugs, responding to customer questions, or finding out when and why that critical piece of data was updated to the wrong value. One of the biggest grievances with the built-in log handling of Google App Engine is that historical logs are only available for the previous three days. We wanted to do a little bit better and have logs available for a 30 day time period. This article outlines a method we’ve developed for pushing App Engine logs to an elasticsearch cluster. ...

January 23, 2015 · 2 min · Kevin Sookocheff

Parsing bash script options with getopts

A common task in shell scripting is to parse command line arguments to your script. Bash provides the getopts built-in function to do just that. This tutorial explains how to use the getopts built-in function to parse arguments and options to a bash script. ...

January 4, 2015 · 5 min · Kevin Sookocheff

Managing App Engine Dependencies Using pip

One unfortunate difficulty when working with App Engine is managing your local dependencies. You don’t have access to your Python environment so all libraries you wish to use must be vendored with your installation. That is, you need to copy all of your library code into a local folder to ship along with your app. ...

December 30, 2014 · 2 min · Kevin Sookocheff

App Engine MapReduce API - Part 7: Writing a Custom Output Writer

View all articles in the MapReduce API Series. The MapReduce library supports a number of default output writers. You can also write your own that implements the output writer interface. This article examines how to write a custom output writer that pushes data from the App Engine datastore to an elasticsearch cluster. A similar pattern can be followed to push the output from your MapReduce job to any number of places. ...

December 22, 2014 · 4 min · Kevin Sookocheff

The Bash String Operators

A common task in bash programming is to manipulate portions of a string and return the result. bash provides rich support for these manipulations via string operators. The syntax is not always intuitive so I wanted to use this blog post to serve as a permanent reminder of the operators. The string operators are signified with the ${} notation. The operations can be grouped in to a few classes. Each heading in this article describes a class of operation. ...

December 11, 2014 · 2 min · Kevin Sookocheff