Deploying Kafka to Google Compute Engine

This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system — it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server.

This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit.

#!/usr/bin/env bash

STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION

if [[ -f $STARTUP_MARK ]]; then
  exit 0
fi

Then we configure our Kafka and Scala version numbers used in the rest of the script.

SCALA_VERSION=2.10
KAFKA_VERSION=0.9.0.0-SNAPSHOT
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"

Next, we install any prerequisites needed to run Kafka. Namely, supervisor and Java.

sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre

Now we are ready to download and run Kafka. We use our version variables defined earlier and extract Kafka to $KAFKA_HOME.

wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

We use supervisor to run both Zookeeper and Kafka. Supervisor takes care of keeping the processes alive and restarting on any failures, including system restart. Supervisor requires a configuration file for the services it monitors so we create one for Zookeeper and one for Kafka.

cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF

cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF

sudo supervisorctl reread
sudo supervisorctl update

Finally, we create a test topic we can use for development.

$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Our Kafka instance is now ready to use! One trick with Compute Engine is that each instance has both an internal and external IP. Any VM running on Compute Engine has access to the internal IP (as long as they are on the same network and the appropriate firewall rules have been created). Because of the way that Kafka advertises the hostname to connect to a broker, the configuration above assumes that you will be accessing Kafka from the internal IP. Therefore, any Compute Engine instance within the same network will be able to communicate with this Kafka instance. If you need to connect to the instance externally (e.g. from your laptop) you will need to modify your /etc/hosts file to translate from your machines internal hostname to the appropriate external IP address of your Compute Engine instance. Something like the following should do the trick: you will have to modify it for the particular values of your Compute Engine instance.

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1	localhost
255.255.255.255	broadcasthost
::1             localhost

104.154.46.158 kafka-0-8-2-1.c.myproject.internal kafka-0-8-2-1

Putting it all together

Below is the contents of the startup script in their entirety. By passing this script to the instance on startup Compute Engine will install and run Kafka with Zookeeper.

gcloud compute instances create kafka-0-8-2-1 \
  --image debian-7-backports \
  --metadata-from-file startup-script=kafka_startup_script.sh

#!/usr/bin/env bash

STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION

# Exit if this script has already ran
if [[ -f $STARTUP_MARK ]]; then
  exit 0
fi

set -o nounset
set -o pipefail
set -o errexit

SCALA_VERSION=2.11
KAFKA_VERSION=0.8.2.1
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"

# Install prerequesites
sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre

# Download Kafka
wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

# Configure Supervisor
cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF

cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF

# Run
sudo supervisorctl reread
sudo supervisorctl update

$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

touch $STARTUP_MARK

Putting it all together#

Putting it all together