This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system — it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server.
This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit.
#!/usr/bin/env bash
STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION
if [[ -f $STARTUP_MARK ]]; then
exit 0
fi
Then we configure our Kafka and Scala version numbers used in the rest of the script.
SCALA_VERSION=2.10
KAFKA_VERSION=0.9.0.0-SNAPSHOT
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"
Next, we install any prerequisites needed to run Kafka. Namely, supervisor and Java.
sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre
Now we are ready to download and run Kafka. We use our version variables defined earlier and extract Kafka to $KAFKA_HOME.
wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
We use supervisor to run both Zookeeper and Kafka. Supervisor takes care of keeping the processes alive and restarting on any failures, including system restart. Supervisor requires a configuration file for the services it monitors so we create one for Zookeeper and one for Kafka.
cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF
cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF
sudo supervisorctl reread
sudo supervisorctl update
Finally, we create a test topic we can use for development.
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Our Kafka instance is now ready to use! One trick with Compute Engine is that each instance has both an internal and external IP. Any VM running on Compute Engine has access to the internal IP (as long as they are on the same network and the appropriate firewall rules have been created). Because of the way that Kafka advertises the hostname to connect to a broker, the configuration above assumes that you will be accessing Kafka from the internal IP. Therefore, any Compute Engine instance within the same network will be able to communicate with this Kafka instance. If you need to connect to the instance externally (e.g. from your laptop) you will need to modify your /etc/hosts
file to translate from your machines internal hostname to the appropriate external IP address of your Compute Engine instance. Something like the following should do the trick: you will have to modify it for the particular values of your Compute Engine instance.
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
104.154.46.158 kafka-0-8-2-1.c.myproject.internal kafka-0-8-2-1
Putting it all together
Below is the contents of the startup script in their entirety. By passing this script to the instance on startup Compute Engine will install and run Kafka with Zookeeper.
gcloud compute instances create kafka-0-8-2-1 \
--image debian-7-backports \
--metadata-from-file startup-script=kafka_startup_script.sh
#!/usr/bin/env bash
STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION
# Exit if this script has already ran
if [[ -f $STARTUP_MARK ]]; then
exit 0
fi
set -o nounset
set -o pipefail
set -o errexit
SCALA_VERSION=2.11
KAFKA_VERSION=0.8.2.1
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"
# Install prerequesites
sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre
# Download Kafka
wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
# Configure Supervisor
cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF
cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF
# Run
sudo supervisorctl reread
sudo supervisorctl update
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
touch $STARTUP_MARK