Durabledict for App Engine

tldr; DatastoreDict. What’s a durabledict? Good question. Durabledict is a Python implementation of a persistent dictionary. The dictionary values are cached locally and sync with the datastore whenever a value in the datastore changes. Disqus provides concrete implementations for Redis, Django, ZooKeeper and in memory. This blog post details an implementation using the App Engine datastore and memcache. Creating your own durabledict By following the guide the durabledict README we can create our own implementation. We need to subclass durabledict.base.DurableDict and implement the following interface methods. Strictly speaking, _pop and _setdefault do not have to be implemented but doing so makes your durabledict behave like a base dict in all cases. ...

April 29, 2015 · 5 min · Kevin Sookocheff

Continuous Delivery Distilled

What if you could deliver more value, with more speed and with more stability? What if you could triage bugs faster? What if you could fix bugs easier and with less user facing impact? You can, with continuous delivery. Terminology First, some terminology. What distinguishes continuous integration, continuous deployment and continuous delivery? Continuous integration revolves around the continuous automated testing of software whenever change to the software is made. Continuous deployment is the practice of automatically deploying any change to the code. Continuous delivery implies that you can deploy any change to production but for any number of reasons you may choose not to. The focus of this article is on continuous delivery. ...

April 23, 2015 · 13 min · Kevin Sookocheff

Creating a BigQuery Table using the Java Client Library

I haven’t been able to find great documentation on creating a BigQuery TableSchema using the Java Client Library. This blog post hopes to rectify that :). You can use the BigQuery sample code for an idea of how to create a client connection to BigQuery. Assuming you have the connection set up you can start by creating a new TableSchema. The TableSchema provides a method for setting the list of fields that make up the columns of your BigQuery Table. Those columns are defined as an Array of TableFieldSchema objects. ...

March 23, 2015 · 2 min · Kevin Sookocheff

Deploying R Studio on Compute Engine

Sometimes you have a data analysis problem that is just too big for your desktop or laptop. The limiting factor here is generally RAM. Thankfully, services like Google Compute Engine allow you to lease servers with up to 208GB of RAM, large enough for a wide variety of intensive tasks. An ancillary benefit of using a service like Compute Engine is that it allows you to easily load your data from a Cloud Storage Bucket, meaning you don’t need to keep a copy of the large dataset locally at all times. ...

March 23, 2015 · 3 min · Kevin Sookocheff

Keeping App Engine Search Documents and Datastore Entities In Sync

At Vendasta the App Engine Datastore serves as the single point of truth for most operational data and the majority of interactions are against this single point of truth. However, a piece of required functionality in many of our products is to provide a searchable view of the data in the App Engine Datastore. Search is difficult using the Datastore and so we have moved to using the Search API as a managed solution for searching datastore entities. In this use case, every edit to an entity in the Datastore is reflected as a change to a Search Document. This article details an architecture for keeping Datastore entities and Search Documents in sync throughout failure and race conditions. ...

February 23, 2015 · 6 min · Kevin Sookocheff

Halting Python unittest Execution on First Error

We all know the importance of unit tests. Especially in a dynamic language like Python. Occasionally you have a set of unit tests that are failing in a cascading fashion where the first error case causes subsequent tests to fail (these tests are likely no longer unit tests, but that’s a different discussion). To help isolate the offending test case in a see of failures you can set the unittest.TestCase class to halt after the first error by overriding the run method as follows. ...

February 12, 2015 · 1 min · Kevin Sookocheff

Create a Google Cloud Dataflow Project with Gradle

I’ve been experimenting with the Google Cloud Dataflow Java SDK for running managed data processing pipelines. One of the first tasks is getting a build environment up and running. For this I chose Gradle. We start by declaring this a java application and listing the configuration variables that declare the source compatibility level (which for now must be 1.7) and the main class to be executed by the run task to be defined later. ...

February 11, 2015 · 2 min · Kevin Sookocheff

A pypiserver Deployment Script

At Vendasta we’ve been slowly adopting pypi and pip for our internal code libraries and the time has come to deploy our own private pypi server. After evaluating a few options I settled on the simplistic pypiserver – a barebones implementation of the simple HTTP API to pypi. The deployment uses nginx as a front-end to pypiserver. pypiserver itself is ran and monitored using supervisord. I created a bash script that creates a user and group to run pypiserver and installs and runs nginx, supervisord and pypiserver. I’ve been running this bash script through Vagrant to deploy a custom pypiserver for private use. I wanted to save this code for posterity and hopefully help someone else working on the same task. ...

February 1, 2015 · 2 min · Kevin Sookocheff

Downloading files from Google Cloud Storage with webapp2

I’ve been working on a simple App Engine application that offers upload and download functionality to and from Google Cloud Storage. When it came time to actually download the content I needed to write a webapp2 RequestHandler that will retrieve the file from Cloud Storage and return it to the client. ...

January 27, 2015 · 1 min · Kevin Sookocheff

Querying App Engine Logs with Elasticsearch

From a DevOps perspective having a historical record of application logs can aid immensely in tracking down bugs, responding to customer questions, or finding out when and why that critical piece of data was updated to the wrong value. One of the biggest grievances with the built-in log handling of Google App Engine is that historical logs are only available for the previous three days. We wanted to do a little bit better and have logs available for a 30 day time period. This article outlines a method we’ve developed for pushing App Engine logs to an elasticsearch cluster. ...

January 23, 2015 · 2 min · Kevin Sookocheff