Getting Started with Apache Kafka on Ubuntu using KRaft Mode

If you have ever needed to move large volumes of events between services without losing any of them, you have probably heard of Apache Kafka. It powers activity tracking, log aggregation, metrics pipelines, and event-driven systems at companies of every size. But many tutorials still start with ZooKeeper, which is now deprecated for Kafka. In this guide we will do it the modern way using KRaft mode, where Kafka manages its own metadata and you no longer need a separate ZooKeeper process.

By the end of this tutorial you will have a working single-node Kafka broker on Ubuntu, you will create your first topic, send and read messages from the command line, and finally write a small producer and consumer in Node.js.

If you have read my earlier post on Message Queue vs Pub/Sub, this will feel familiar. Kafka sits firmly on the publish/subscribe side. Unlike a traditional message queue that pushes a message to one consumer and deletes it after acknowledgement, Kafka stores messages in a durable, append-only log that many independent consumers can re-read at their own pace.

Who This Tutorial Is For

This guide is aimed at developers, sysadmins, and DevOps engineers who are comfortable on the Linux command line but new to Kafka. You do not need any prior streaming experience. If you can SSH into an Ubuntu box, edit a config file, and run a command, you are ready.

Conceptual Overview

Before touching commands, let us get the vocabulary straight. Kafka has a few core ideas, and once they click, everything else makes sense.

A broker is a single Kafka server. A group of brokers forms a cluster. In this tutorial we run one broker, which is perfect for learning and local development.

A topic is a named stream of messages, similar to a table name or a channel. Producers write to a topic, consumers read from it.

Each topic is split into one or more partitions. A partition is an ordered, append-only log. Partitions are how Kafka scales, because different partitions can live on different brokers and be read in parallel. Every message inside a partition gets a sequential number called an offset.

A producer is any program that writes messages to a topic. A consumer reads them. Consumers can join a consumer group, and Kafka splits the partitions of a topic among the members of that group so the work is shared.

Finally, KRaft (Kafka Raft) is the consensus protocol Kafka now uses to manage cluster metadata internally. Before KRaft, that job belonged to ZooKeeper. KRaft means fewer moving parts, faster startup, and one less service to babysit.

Prerequisites

Here is what you need before starting.

Software requirements:

Ubuntu 20.04, 22.04, or 24.04 (the steps work on all three)
Java 17 or newer, since Kafka runs on the JVM
curl and tar for downloading and extracting

System requirements:

At least 2 GB of RAM (4 GB is more comfortable)
A regular user account with sudo privileges
An internet connection to download Kafka

Skills needed:

Basic Linux command-line usage
Editing text files with nano or vim

Step 1: Install Java

Kafka is a Java application, so the first thing we do is install a Java runtime. We will use the OpenJDK 17 headless package, which is lightweight and has no GUI dependencies.

sudo apt update
sudo apt install -y openjdk-17-jdk-headless

Confirm the installation:

java -version

You should see output similar to this:

openjdk version "17.0.11" 2024-04-16
OpenJDK Runtime Environment (build 17.0.11+9-Ubuntu-1)
OpenJDK 64-Bit Server VM (build 17.0.11+9-Ubuntu-1, mixed mode, sharing)

If the version is 17 or higher, you are good to go.

Step 2: Create a Dedicated Kafka User

Running services as your own login account or as root is a bad habit. We will create a system user named kafka that owns the installation. This keeps permissions tidy and limits the damage if something goes wrong.

sudo useradd -r -m -d /opt/kafka -s /bin/bash kafka

The -r flag makes it a system account, -m creates the home directory at /opt/kafka, and -s gives it a shell so we can switch into it when needed.

Step 3: Download and Extract Kafka

Now download Kafka. At the time of writing, 3.7.0 is a stable release that supports KRaft out of the box. Always check the official downloads page for the latest stable version.

cd /tmp
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz

Extract the archive straight into the kafka user’s home directory and fix ownership:

sudo tar -xzf kafka_2.13-3.7.0.tgz -C /opt/kafka --strip-components=1
sudo chown -R kafka:kafka /opt/kafka

The --strip-components=1 part removes the top-level folder from the archive so the files land directly in /opt/kafka instead of /opt/kafka/kafka_2.13-3.7.0. Your binaries now live in /opt/kafka/bin and your configs in /opt/kafka/config.

Step 4: Configure Kafka for KRaft Mode

KRaft uses a different config file than the old ZooKeeper setup. It lives at /opt/kafka/config/kraft/server.properties. Open it as the kafka user:

sudo -u kafka nano /opt/kafka/config/kraft/server.properties

The defaults are fine for a single-node setup, but it helps to understand the important lines. Look for these settings and make sure they match what is below.

# This node is both the controller (metadata) and the broker (data)
process.roles=broker,controller

# A unique numeric id for this node
node.id=1

# Tells the node where the controller quorum lives
controller.quorum.voters=1@localhost:9093

# Where the broker listens for clients and for controller traffic
listeners=PLAINTEXT://:9092,CONTROLLER://:9093

# The address clients use to reach this broker
advertised.listeners=PLAINTEXT://localhost:9092

# Where Kafka stores its log segments
log.dirs=/opt/kafka/kraft-logs

A couple of these deserve a short explanation. The process.roles line is what makes this a combined node, acting as both the controller that holds cluster metadata and the broker that stores your messages. That combination is exactly what you want for a single machine. The advertised.listeners value is the address Kafka hands out to clients, so if you later connect from another machine, change localhost to the server’s IP or hostname.

Make sure the log directory exists and is owned by kafka:

sudo -u kafka mkdir -p /opt/kafka/kraft-logs

Step 5: Generate a Cluster ID and Format Storage

KRaft requires every cluster to have a unique ID, and the storage directory must be formatted with it before the broker starts. This is a one-time step.

Generate the ID:

sudo -u kafka /opt/kafka/bin/kafka-storage.sh random-uuid

You will get a string like Xj4kP9wQTf2bN1aLcD8eRg. Copy it, then format the storage using that value:

sudo -u kafka /opt/kafka/bin/kafka-storage.sh format \
  -t Xj4kP9wQTf2bN1aLcD8eRg \
  -c /opt/kafka/config/kraft/server.properties

A successful format prints:

Formatting /opt/kafka/kraft-logs with metadata.version 3.7-IV4.

If you skip this step, Kafka will refuse to start and complain that the storage is not formatted.

Step 6: Run Kafka as a systemd Service

You could start Kafka by hand, but it would stop the moment you close your terminal. A systemd service keeps it running in the background and restarts it after a reboot or crash.

Create the unit file:

sudo nano /etc/systemd/system/kafka.service

Paste in the following:

[Unit]
Description=Apache Kafka (KRaft mode)
After=network.target

[Service]
Type=simple
User=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

The Restart=on-failure line means systemd will bring Kafka back if it dies unexpectedly. Reload systemd, then enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable --now kafka

Check that it is healthy:

sudo systemctl status kafka

Look for active (running) in green. You can also tail the logs to watch the startup:

sudo journalctl -u kafka -f

When you see a line containing Kafka Server started, your broker is live and listening on port 9092.

Step 7: Create Your First Topic

Kafka ships with handy shell scripts in /opt/kafka/bin. Let us create a topic called orders with three partitions.

/opt/kafka/bin/kafka-topics.sh --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 1 \
  --bootstrap-server localhost:9092

We use --replication-factor 1 because we only have one broker. In production you would set this to 3 so each partition has copies on different brokers. List your topics to confirm:

/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

You should see orders in the output. To inspect the details, including how partitions are laid out, run:

/opt/kafka/bin/kafka-topics.sh --describe \
  --topic orders \
  --bootstrap-server localhost:9092

Step 8: Produce and Consume from the CLI

Before writing any code, it is worth proving the pipe works using the built-in console tools. Open a terminal and start a producer:

/opt/kafka/bin/kafka-console-producer.sh \
  --topic orders \
  --bootstrap-server localhost:9092

The prompt waits for input. Type a few messages, pressing Enter after each:

order #1001 placed
order #1002 placed
order #1003 placed

Now open a second terminal and start a consumer that reads everything from the beginning:

/opt/kafka/bin/kafka-console-consumer.sh \
  --topic orders \
  --from-beginning \
  --bootstrap-server localhost:9092

The three messages you typed appear immediately. The --from-beginning flag tells Kafka to replay the whole log rather than only showing new messages. This is the superpower that sets Kafka apart from a traditional queue, because the data is still there to be read again. Press Ctrl+C in both terminals when you are done.

Step 9: Produce and Consume with Node.js

The CLI is great for testing, but real applications talk to Kafka through a client library. We will use kafkajs, the most popular pure JavaScript client. Set up a small project:

mkdir ~/kafka-demo && cd ~/kafka-demo
npm init -y
npm install kafkajs

Create a producer file named producer.js:

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "orders-app",
  brokers: ["localhost:9092"],
});

const producer = kafka.producer();

async function run() {
  await producer.connect();

  for (let i = 1; i <= 5; i++) {
    await producer.send({
      topic: "orders",
      messages: [
        { key: `order-${i}`, value: `Order ${i} created at ${new Date().toISOString()}` },
      ],
    });
    console.log(`Sent order ${i}`);
  }

  await producer.disconnect();
}

run().catch(console.error);

The key matters more than it looks. Kafka uses the key to decide which partition a message lands in, and all messages with the same key always go to the same partition. That guarantees ordering for a given key, for example all events for one customer.

Now create consumer.js:

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "orders-app",
  brokers: ["localhost:9092"],
});

const consumer = kafka.consumer({ groupId: "orders-group" });

async function run() {
  await consumer.connect();
  await consumer.subscribe({ topic: "orders", fromBeginning: true });

  await consumer.run({
    eachMessage: async ({ partition, message }) => {
      console.log({
        partition,
        offset: message.offset,
        key: message.key.toString(),
        value: message.value.toString(),
      });
    },
  });
}

run().catch(console.error);

Run the consumer first so it is ready and listening:

node consumer.js

In another terminal, run the producer:

node producer.js

Back in the consumer terminal, you will see each message printed with its partition and offset. Notice the groupId in the consumer. If you start a second copy of consumer.js with the same group, Kafka automatically splits the three partitions between the two instances. That is how you scale consumers horizontally without writing any extra coordination code.

Common Mistakes and Troubleshooting

Kafka will not start and the log mentions storage not formatted. You skipped Step 5, or you changed log.dirs after formatting. Run the kafka-storage.sh format command again against the correct directory.

Connection refused on port 9092. The broker is not running, or it is still starting. Check sudo systemctl status kafka and tail the logs with journalctl -u kafka -f. Startup can take 10 to 20 seconds on a small machine.

A client on another machine cannot connect even though the broker is up. This is almost always the advertised.listeners setting. It must point to an address the client can actually reach, not localhost. Set it to the server’s real IP or hostname, then restart Kafka.

The consumer prints nothing. Confirm the topic name matches exactly, and remember that without fromBeginning: true a brand new consumer group only receives messages produced after it connected.

Java version errors at startup. Kafka 3.7 needs Java 17 or newer. Run java -version and reinstall the correct JDK if the number is lower.

Best Practices

For anything beyond a learning box, keep these points in mind.

Use a replication factor of at least 3 in production so the loss of one broker does not lose data. That requires at least three brokers in the cluster.

Do not over-partition. More partitions give more parallelism but also more overhead. Start with a number that matches your expected consumer count and grow later, since you can add partitions but never remove them.

Secure the broker. The PLAINTEXT listener we used has no authentication or encryption, which is fine on a private subnet but never on the open internet. For real deployments, enable TLS and SASL authentication, and put the broker behind a firewall so only trusted services reach port 9092.

Set retention deliberately. By default Kafka keeps messages for seven days. Tune log.retention.hours or log.retention.bytes per topic based on how long consumers might need to replay data, balanced against disk space.

Monitor consumer lag. Lag is the gap between the latest offset and where a consumer group has read. Growing lag means consumers cannot keep up. Track it with kafka-consumer-groups.sh --describe or a monitoring tool like Prometheus.

Conclusion

You now have a working Apache Kafka broker running in KRaft mode on Ubuntu, with no ZooKeeper in sight. Along the way you installed Java, set up a dedicated user, formatted KRaft storage, ran Kafka as a managed systemd service, created a partitioned topic, and moved messages both from the command line and from a real Node.js producer and consumer.

From here, the natural next steps are to spin up a three-broker cluster to see replication in action, explore Kafka Connect for piping data in and out of databases, and look at Kafka Streams or a framework like Faust for processing events as they arrive. Once you understand topics, partitions, offsets, and consumer groups, the rest of the Kafka ecosystem becomes a lot easier to navigate. Happy streaming.