Installing Multi-node Kafka Cluster
Transcript and Commands
Hello and welcome to Kafka Streams – Real-time stream processing! at Learning Journal. This video complements my book on Kafka Streams. You can get more details about the Kafka Streams Book here.
The book includes several code examples. If you are willing to follow along and try
those examples yourself, you will need to set up a small Kafka cluster and some
other tools such as development IDE and a build tool.
This video is based on the book’s Appendix A – Installing Kafka Cluster.
This
video will provide detailed instructions to set up the exact environment that is
used to create and test the examples in the book.
In this video, we will create a three-node Kafka cluster in the Cloud
Environment. I will be using Google Cloud Platform to create three Kafka nodes and
one Zookeeper server. So, you will need four Linux VMs to follow along. We will be
using CentOS 7 operating system on all the four VMs.
Great! Let’s start. We will follow a four-step process.
- Preparing the VMs for Kafka
- Configuring the Zookeeper
- Configuring Kafka Brokers
- Testing Kafka Installation
In the first step, we will create VMs in the Google Cloud platform and prepare them
to run Kafka processes. I will be using the Google Cloud platform. However, the
overall process remains the same on physical machines as well as on other Cloud
platforms. All you need is the four CentOS 7 machines with sudo privileges.
In the second step, we will configure and start the Zookeeper server on one
machine. We will also test the Zookeeper server process.
The third step is to configure and start Kafka Brokers on three different
machines. We will also configure the VM to make sure the brokers auto start over
the reboot.
The final and fourth step is to restart all the machines and perform a sanity
check for all the services.
That’s all. Once this is done, you will have the exact cluster environment that
I will be using throughout the book to execute and test my examples.
Preparing the VMs for Kafka
Let’s start with the first step. Create four VMs in GCP. You can choose whatever
name you want, I am naming them as Kafka-0, Kafka-1, Kafka-2, and zookeeper. You
can select the nearest zone location. I create them in the Mumbai data centre.
Select your CPU and memory configurations. I think a single CPU core with 1.7 GB of
RAM is good enough to start with. You can increase these resources later without
reconfiguring your cluster. We want to use CentOS 7 as our base operating system.
Let’s take 10 GB disk on each machine.
Repeat the same process and create four VMs.
The first thing that we need on these four VMs is the JDK 1.8. I will install
OpenJDK for the sake of simplicity.
Let’s do that on all four machines. SSH to your VM and execute the yum command.
Repeat the same on all other computers.
Great! Now we want to download Apache Kafka binaries. I will need the wget tool
to download anything on the VMs. So, let’s install wget. Execute the yum command.
Simple, isn’t it?
Repeat the same on all the four VMs.
Now you are ready to download Apache Kafka binaries. You can get the download
link from the
Apache Kafka mirrors website.
Copy one of the mirror URL and download Kafka binaries using wget command.
Done. Let’s uncompress the binaries. You can use the tar command.
Repeat the same on all four VMs
Let’s take a quick look at the uncompressed folder. There are two main
directories that we will be referring throughout this video.
The bin directory and the config directory.
The bin folder holds all executables such as various Kafka and Zookeeper tools.
The config directory holds two main configuration files.
- zookeeper.properties
- server.properties
We define all Zookeeper configurations in the zookeeper.properties file.
And all Kafka broker configurations are defined in the
server.properties
file.
Great! We will be executing many commands that reside in the bin directory. I
don’t want to include directory names all the time when I am running a Kafka
command or a zookeeper command. So, let’s add the bin directory in our PATH
environment variable.
Open the .bash_profile and add the Kafka bin directory in your path.
Repeat the same on all four VMs. Great! Your VMs are ready to start the actual
configuration.
Let’s move on to the next step. Configure and start the Zookeeper server.
Configuring the Zookeeper
Apache Kafka needs zookeeper. In a production environment, you would want to
configure a zookeeper cluster known as Zookeeper Ensemble. However, for the
development activities, you can set up a single Zookeeper instance. I planned to
keep Zookeeper on a separate node as all my VMs are tiny machines with single CPU
core and less than 2 Gigs of RAM.
So, SSH to your zookeeper machine. We do not need to download Zookeeper
separately. The Kafka download also includes a copy of Zookeeper.
The first thing is to check out the Zookeeper configuration file. Let’s open
the zookeeper.properties file.
The only configuration that I want to change is the data directory. The default value is specified as a key-value pair. If you want, you can use the default location. However, I am going to change it to some other appropriate location.
That’s all. We don’t want to change or add any other configuration.
Save the file.
Let me create the Zookeeper data directory.
Great! We are ready to start Zookeeper server. Starting Zookeeper is straightforward. All you need to do is to execute the zookeeper-server-start.sh and provide the zookeeper.properties as an argument.
Great! My zookeeper server is running. Press CTRL+C to terminate the process. Now I
am confident that the configurations are good, and the server starts with no
issues. For my day to day comfort, I would want to place zookeeper start command in
the rc.local file and enable systemctl to ensure that the zookeeper
automatically
starts whenever I start the VM. Let’s do that.
Open you /etc/rc.d/rc.local file and place the start command at the
bottom of
the file. Make sure to specify the full path.
We also want to redirect the standard output, and standard errors to /dev/null and
execute the zookeeper in the background.
Great! Save the file and give the execute permission to your rc.local.
You also need to add the rc-local service to systemctl.
Finally, start your rc-local service.
Great! We are done. Do you want to test your Zookeeper server?
Let’s executing a Zookeeper shell command.
This command should report back as “node does not exist.” However, we know that the
Zookeeper server is responding. Once you start your Kafka brokers, the same command
would give you a list of active Kafka brokers.
Great! We are done with step two. Do not perform the zookeeper configuration on
any other node. We need the Zookeeper on a single machine. Right?
Great! The next step is to configure the Kafka broker on remaining three nodes.
Configuring Kafka Brokers
Unlike Zookeeper, we will be changing and adding quite a few configuration
properties for the Kafka brokers.
Let’s take a quick look at the main configuration properties that we want to
change in the server.properties file on each Kafka node.
I have prepared a table with the details. Let me quickly walk you through the
configurations.
The first property is the broker ID. Every Kafka broker needs a unique ID. We will
set this value to zero, one and two for the three brokers.
The next item is the broker rack name. This property specifies the rack name of
the broker, and it is used in rack aware replication assignment for the fault
tolerance. We want the first two brokers to be part of the RACK1 and the third one
to be part of the RACK2.
The next item is the log file directory location. This location is the base
directory where Kafka broker would store the partition replica. You can keep the
default value or change it to some other appropriate directory location. We want to
change it to a different directory location. We also need to make sure that the
directory already exists. So, I will create this directory on all the broker
machines.
The next one is the number of partitions for the offset topic. Kafka internally
creates a topic to store offsets. This configuration controls the number of
partitions for the offset topic. The default value is quite high, I think 50, that
doesn’t make sense for a dev environment. We want to change it to a lower value.
The next property is the replication factor for the offset topic. The default
value is three, and we want to bring it down to two.
The next one is the minimum number of replicas in the ISR list. I have talked
about all these configurations in my book. The default value is one, and we want to
change it to two.
The next one is the default replication factor for automatically created
topics. We want to set this value to two.
Finally, the most essential configuration. The Zookeeper connection details.
This property specifies the Zookeeper hostname or IP and the port number. We have
started the Zookeeper on one of the VMs, and hence the value for this property
should represent the host_ip:port of the same machine.
Good. Let’s create the data directory.
Now I can go ahead and modify the server.properties file for the first broker.
The broker ID is already zero. Let me add the broker rack here.
The next item is the log directory. Let me change it to the directory that I
just created.
Let me change and add all the topic defaults at this place.
Good. The last one is the Zookeeper connection details.
That’s all. We are done with the configurations.
I am ready to start the broker. Starting the broker is as simple as executing
the kafka-server-start.sh, and giving the server.properties as an argument.
I don’t see any error messages, and my broker is running. Let me shut it down.
Press CTRL+C.
Now I want to place the broker start command in the rc.local file and configure
it to autostart as we did for the Zookeeper. Right?
Let’s do that. Open your rc.local file. Add the Kafka server start command at
the bottom of the file. Once again, make sure to specify the full path. Let me
redirect the standard output, and standard error to /dev/null and change it to the
background process.
What’s next? You already know that, right? We did it for the Zookeeper.
Give an execute permission to your rc.local file.
Add your rc-local service to systemctl.
Start your rc-local service.
Done. Repeat the same steps for all other brokers. I am doing it for three brokers.
However, if you want, you can set up five or 10 brokers by simply following the
same steps.
Once you finish configuring all the brokers, stop all the VMs.
Now, I start the Zookeeper VM first. Once that is up and running, I start all
other broker VMs. This stop and restart will help me to test that all the server
processes are automatically starting.
Great! The final step. Test your cluster.
Testing Kafka Installation
SSH to one of the machines.
Execute Zookeeper shell and check the list of active broker IDs.
Easy. Isn’t it? We have three broker IDs. Zero, One and two. All three are active.
Let’s create a new topic.
You can use kafka-topics.sh. The first option is the create option, then the
zookeeper details, replication factor, number of partitions, and finally the topic
name.
Now you can list the topics. Again kafka-topics.sh. The first option is the list option, then the zookeeper coordinates.
Now, since we have a topic, let’s start a console producer and send some messages. We will use kafka-console-producer, give at least one broker IP and port, then the topic name to which we want to send the messages.
Start typing some messages. Press CTRL+C to exit.
Now the last one. Start a console consumer and check the messages that we sent
from the producer. We will use kafka-console-consumer.sh, at least one broker
coordinate to bootstrap the consumer, the topic name and the offset from where we
want to start reading the messages. This is the first time I am reading it, so
let’s start from the beginning.
I can see all the messages. That all.
In this video, we created one Zookeeper server, three Kafka brokers, we
configured all of them to autostart, and we also tested all our services.
Thank you very much. Please visit www.learningjournal.guru for latest
technology books and self-paced video training.
Keep learning and Keep growing.