Create a Google Cloud Dataflow Project with Gradle

I’ve been experimenting with the Google Cloud Dataflow Java SDK for running managed data processing pipelines. One of the first tasks is getting a build environment up and running. For this I chose Gradle. We start by declaring this a java application and listing the configuration variables that declare the source compatibility level (which for now must be 1.7) and the main class to be executed by the run task to be defined later. ...

February 11, 2015 · 2 min · Kevin Sookocheff

A pypiserver Deployment Script

At Vendasta we’ve been slowly adopting pypi and pip for our internal code libraries and the time has come to deploy our own private pypi server. After evaluating a few options I settled on the simplistic pypiserver – a barebones implementation of the simple HTTP API to pypi. The deployment uses nginx as a front-end to pypiserver. pypiserver itself is ran and monitored using supervisord. I created a bash script that creates a user and group to run pypiserver and installs and runs nginx, supervisord and pypiserver. I’ve been running this bash script through Vagrant to deploy a custom pypiserver for private use. I wanted to save this code for posterity and hopefully help someone else working on the same task. ...

February 1, 2015 · 2 min · Kevin Sookocheff

Downloading files from Google Cloud Storage with webapp2

I’ve been working on a simple App Engine application that offers upload and download functionality to and from Google Cloud Storage. When it came time to actually download the content I needed to write a webapp2 RequestHandler that will retrieve the file from Cloud Storage and return it to the client. ...

January 27, 2015 · 1 min · Kevin Sookocheff

Querying App Engine Logs with Elasticsearch

From a DevOps perspective having a historical record of application logs can aid immensely in tracking down bugs, responding to customer questions, or finding out when and why that critical piece of data was updated to the wrong value. One of the biggest grievances with the built-in log handling of Google App Engine is that historical logs are only available for the previous three days. We wanted to do a little bit better and have logs available for a 30 day time period. This article outlines a method we’ve developed for pushing App Engine logs to an elasticsearch cluster. ...

January 23, 2015 · 2 min · Kevin Sookocheff

Parsing bash script options with getopts

A common task in shell scripting is to parse command line arguments to your script. Bash provides the getopts built-in function to do just that. This tutorial explains how to use the getopts built-in function to parse arguments and options to a bash script. ...

January 4, 2015 · 5 min · Kevin Sookocheff

Managing App Engine Dependencies Using pip

One unfortunate difficulty when working with App Engine is managing your local dependencies. You don’t have access to your Python environment so all libraries you wish to use must be vendored with your installation. That is, you need to copy all of your library code into a local folder to ship along with your app. ...

December 30, 2014 · 2 min · Kevin Sookocheff

App Engine MapReduce API - Part 7: Writing a Custom Output Writer

View all articles in the MapReduce API Series. The MapReduce library supports a number of default output writers. You can also write your own that implements the output writer interface. This article examines how to write a custom output writer that pushes data from the App Engine datastore to an elasticsearch cluster. A similar pattern can be followed to push the output from your MapReduce job to any number of places. ...

December 22, 2014 · 4 min · Kevin Sookocheff

The Bash String Operators

A common task in bash programming is to manipulate portions of a string and return the result. bash provides rich support for these manipulations via string operators. The syntax is not always intuitive so I wanted to use this blog post to serve as a permanent reminder of the operators. The string operators are signified with the ${} notation. The operations can be grouped in to a few classes. Each heading in this article describes a class of operation. ...

December 11, 2014 · 2 min · Kevin Sookocheff

App Engine MapReduce API - Part 6: Writing a Custom Input Reader

View all articles in the MapReduce API Series. One of the great things about the MapReduce library is the abilitiy to write a cutom InputReader to process data from any data source. In this post we will explore how to write an InputReader the leases tasks from an AppEngine pull queue by implementing the InputReader interface. ...

December 4, 2014 · 7 min · Kevin Sookocheff

Installing MySQL-Python on OS X Yosemite

Installing the MySQL-Python package requires a few steps. In an effort to aid future Internet travellers, this post will document how to install the MySQL-Python package on OS X Yosemite. First, install MariaDB, the drop-in replacement for MySQL. I chose MacPorts for this task, though Homebrew would work just fine. Second, update your PATH to include the mariadb executables. Third, install the Python MySQL connector. sudo port install mariadb PATH=/opt/local/lib/mariadb/bin:$PATH pip install MySQL-Python That’s it! You should be able to import MySQLdb in your Python code and interact with your MariaDB database. ...

November 18, 2014 · 1 min · Kevin Sookocheff