Deploying Kafka to Google Compute Engine

This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system — it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server. This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit. ...

October 12, 2015 · 4 min · Kevin Sookocheff

Kafka Quick Start Guide

If you’ve read the previous article describing Kafka in a Nutshell you may be itching to write an application using Kafka as a data backend. This article will get you part of the way there by describing how to deploy Kafka locally using Docker and test it using kafkacat. Running Kafka Locally First, if you haven’t already, download and install Docker. Once you have Docker installed, create a default virtual machine that will host your local Docker containers. ...

September 30, 2015 · 3 min · Kevin Sookocheff

Kafka in a Nutshell

Kafka is a messaging system. That’s it. So why all the hype? In reality messaging is a hugely important piece of infrastructure for moving data between systems. To see why, let’s look at a data pipeline without a messaging system. This system starts with Hadoop for storage and data processing. Hadoop isn’t very useful without data so the first stage in using Hadoop is getting data in. Bringing Data in to Hadoop So far, not a big deal. Unfortunately, in the real world data exists on many systems in parallel, all of which need to interact with Hadoop and with each other. The situation quickly becomes more complex, ending with a system where multiple data systems are talking to one another over many channels. Each of these channels requires their own custom protocols and communication methods and moving data between these systems becomes a full-time job for a team of developers. ...

September 25, 2015 · 11 min · Kevin Sookocheff