Docker Step By Step: Containerizing Zookeeper

Follow along with this article as we take a guided tour of containerizing Zookeeper using Docker. This guide will show how to install Zookeeper to the container, how to configure the Zookeeper application, and how to share data volumes between the host and container. ...

December 4, 2015 · 6 min · Kevin Sookocheff

Why Java? Tales from a Python Convert

Whenever I tell people I’ve been working with Java I get the same reaction: “Yuck! Java? Why Java?” And, admittedly, I had the same reaction — at first. But over time, I’ve come to appreciate Java for its type safety, performance, and rock-solid tooling. I’ve also come to notice that this isn’t the Java I was used to — it’s been steadily improving over the last ten years. ...

November 20, 2015 · 10 min · Kevin Sookocheff

Including a local package as a Maven dependency

Lately I’ve been tasked with developing a Java library for internal use. For testing, its proved useful to package the library for local use. This article describes how to add a Jar file to a local Maven repository for use in your own testing and development. Create your local Maven repository Your local Maven repository lives within the project you are developing for. Creating your local repository is as simple as making a new directory. ...

November 12, 2015 · 2 min · Kevin Sookocheff

Configuring an Upstream Remote

This is something I often do but rarely remember the steps for. This post is intended to serve as a reminder for me and anyone else having the same question: how to add an upstream remote git repository. Start by forking the repository you are contributing to and cloning that repository to your local file system. In this example, we will use the Elasticsearch repository and assume you have cloned it locally. ...

November 1, 2015 · 1 min · Kevin Sookocheff

Writing Repeated BigQuery records using the Java Client Library

I’ve recently been working with Java via the Google Cloud Dataflow SDK. One problem I’ve had is working with the BigQuery Java Client. It was never entirely clear how to create a repeated record. This article explains how it works and how you can accomplish the same thing. First, you need to create a new TableRow. For this example, let’s assume we are logging events using a guid and a timestamp. ...

October 27, 2015 · 2 min · Kevin Sookocheff

From JSON to a Google API Client Library object

I have been working on a Cloud Dataflow project that parses incoming App Engine logs to generate status statistics such as numer of errors in the last 10 minutes. The incoming data is a JSON representation of a LogEntry object. A LogEntry object is represented in Java as a LogEntry class. The task I found myself in was converting the JSON representation of a LogEntry into the Java object for downstream processing. ...

October 25, 2015 · 2 min · Kevin Sookocheff

A Maven runner for vim-test

If you haven’t tried vim-test yet, please do. It provides a consistent interface for running unit tests from within vim for a variety of languages and test runners. It also has support for different methods of dispatching your tests from vim to a terminal to retrieve the results. vim-test works out of the box for a number of languages and is easily extensible to support additional languages. In fact, I recently contributed a Maven test runner for the Java language. If you are developing a Maven project, you can now run your tests from within vim using vim-test. ...

October 16, 2015 · 1 min · Kevin Sookocheff

Deploying Kafka to Google Compute Engine

This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system — it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server. This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit. ...

October 12, 2015 · 4 min · Kevin Sookocheff

Kafka Quick Start Guide

If you’ve read the previous article describing Kafka in a Nutshell you may be itching to write an application using Kafka as a data backend. This article will get you part of the way there by describing how to deploy Kafka locally using Docker and test it using kafkacat. Running Kafka Locally First, if you haven’t already, download and install Docker. Once you have Docker installed, create a default virtual machine that will host your local Docker containers. ...

September 30, 2015 · 3 min · Kevin Sookocheff

Kafka in a Nutshell

Kafka is a messaging system. That’s it. So why all the hype? In reality messaging is a hugely important piece of infrastructure for moving data between systems. To see why, let’s look at a data pipeline without a messaging system. This system starts with Hadoop for storage and data processing. Hadoop isn’t very useful without data so the first stage in using Hadoop is getting data in. Bringing Data in to Hadoop So far, not a big deal. Unfortunately, in the real world data exists on many systems in parallel, all of which need to interact with Hadoop and with each other. The situation quickly becomes more complex, ending with a system where multiple data systems are talking to one another over many channels. Each of these channels requires their own custom protocols and communication methods and moving data between these systems becomes a full-time job for a team of developers. ...

September 25, 2015 · 11 min · Kevin Sookocheff