KRaft in Kafka 4.0: Goodbye ZooKeeper, Quorum Controller and Migration
Kafka 4.0 (March 2025) permanently removed ZooKeeper after three years of coexistence with KRaft. This in-depth guide explains how the new Raft-based quorum controller, Path, works of mandatory migration for those coming from Kafka 3.x, the real operational benefits and the pitfalls to be avoided during the transition into production.
The Problem with ZooKeeper
For nearly ten years, every Apache Kafka cluster has required an ensemble Apache ZooKeeper separate to manage metadata: which brokers were active, which broker was leader of which partition, topic and ACL metadata. ZooKeeper is a coordination system distributed robust and reliable, but introduced a number of significant operational problems:
- Double operational complexity: Each team managing Kafka also had to manage a separate ZooKeeper cluster (typically 3 or 5 nodes), with its own monitoring, upgrade cycle, and distinct configuration.
- Limited metadata scalability: ZooKeeper showed performance degradation beyond ~200,000 partitions per cluster, because each partition's metadata was written as separate ZooKeeper nodes.
- Slow controller election: When the Kafka broker controller fell, the new controller had to read the integer cluster status from ZooKeeper before it could operate, a process that could take tens of seconds for large clusters.
- Difficulty in disaster recovery: recovery of a Kafka cluster in case of data loss on ZooKeeper it was a complex and risky manual process.
KRaft Timeline
- KIP-500 (2020): Original proposal to remove ZooKeeper from Kafka
- Kafka 2.8 (April 2021): first version with KRaft in early access (for testing only)
- Kafka 3.3 (October 2022): KRaft declared production-ready for new clusters
- Kafka 3.5 (June 2023): ZooKeeper to KRaft migration tool available
- Kafka 3.7 (March 2024): ZooKeeper mode deprecated
- Kafka 4.0 (March 2025): ZooKeeper mode permanently removed
How KRaft Works: The Raft Consensus Log
The Concept of Metadata Log
The solution adopted in KRaft (Kafka Raft) is elegant: instead of depending on an external system for metadata,
Kafka handles its metadata as a Internal Kafka topic called @metadata.
This topic is replicated via a Raft protocol between controller nodes.
In KRaft, cluster brokers take on one of two roles (or both, in small clusters):
- Controllers: Manages cluster metadata. In a production cluster, a quorum of 3 controllers is recommended. The active controller (Raft leader) processes all metadata changes and replicates them to other controllers.
- Broker: manages partition logs, serves producers and consumers. Brokers keep a copy cache of metadata received from the controller, updated in streaming.
The Raft Protocol in Kafka
Raft is a distributed consensus algorithm designed to be understandable (unlike Paxos). In short: among all quorum nodes, one is elected leader. The leader receives all the scriptures, it propagates them to the followers, and when a majority of the nodes have confirmed the writing, it considers it committed.
In KRaft, this translates like this:
- A metadata operation (create topic, assign partition leader, etc.) arrives at the leader controller
- The leader controller writes the operation to the metadata log as a serialized event
- The event is replicated to controller followers via the protocol
FETCH(leveraging existing Kafka code) - When the majority of controllers have confirmed (quorum), the operation is committed
- Brokers receive metadata updates pushed from the active controller via
MetadataUpdate
# Struttura di una directory dati KRaft (broker+controller combinato)
# /var/lib/kafka/data/
/var/lib/kafka/data/
meta.properties # cluster.id, node.id, version
__cluster_metadata-0/ # il metadata log (partizione 0)
00000000000000000000.log
00000000000000000000.index
00000000000000000000.timeindex
leader-epoch-checkpoint
ordini-effettuati-0/ # log di una partizione normale
ordini-effettuati-1/
...
# meta.properties esempio:
node.id=1
version=1
cluster.id=MkU3OEVBNTcwNTJENDM2Qk
Quorum Controller: Sizing
The quorum controller follows the consensus rules: to tolerate f failure, they are needed 2f+1 knots.
- 3 controllers: tolerates 1 failure (minimum configuration for production)
- 5 controllers: tolerates 2 simultaneous failures (recommended for critical clusters)
- 1 controller: For local development/testing only, no fault tolerance
Controllers can be dedicated (controller role only, do not manage user partitions) or combined (same machines that also act as brokers). For small clusters (< 10 brokers) the controllers combined they are fine. For large or high-throughput clusters, dedicated controllers isolate the management load metadata from the partitions I/O load.
Configuring a KRaft Cluster from Scratch
# server.properties per un nodo controller+broker combinato (cluster single-node per dev)
# ─── Identity ─────────────────────────────────────────────────────────────────
# In KRaft ogni nodo ha un node.id unico nel cluster (sostituisce broker.id)
node.id=1
# Ruoli: "broker" | "controller" | "broker,controller"
process.roles=broker,controller
# Indirizzo del quorum controller: formato node.id@host:port
controller.quorum.voters=1@localhost:9093
# ─── Listeners ────────────────────────────────────────────────────────────────
# KAFKA: listener per producer/consumer
# CONTROLLER: listener per comunicazione KRaft interna
listeners=KAFKA://localhost:9092,CONTROLLER://localhost:9093
advertised.listeners=KAFKA://localhost:9092
listener.security.protocol.map=KAFKA:PLAINTEXT,CONTROLLER:PLAINTEXT
inter.broker.listener.name=KAFKA
controller.listener.names=CONTROLLER
# ─── Storage ──────────────────────────────────────────────────────────────────
log.dirs=/var/lib/kafka/data
# ─── Replication defaults ─────────────────────────────────────────────────────
default.replication.factor=1 # 1 per dev, 3 per produzione
min.insync.replicas=1 # 1 per dev, 2 per produzione
offsets.topic.replication.factor=1
# ─── Retention ────────────────────────────────────────────────────────────────
log.retention.hours=168 # 7 giorni
log.segment.bytes=1073741824 # 1GB per segmento
# Inizializzare il cluster KRaft (una tantum)
# Step 1: generare un cluster UUID univoco
KAFKA_CLUSTER_ID=$(kafka-storage.sh random-uuid)
echo "Cluster ID: $KAFKA_CLUSTER_ID"
# Step 2: formattare la directory storage con il cluster ID
kafka-storage.sh format \
--config /etc/kafka/server.properties \
--cluster-id "$KAFKA_CLUSTER_ID"
# Output:
# Formatting /var/lib/kafka/data with metadata.version 4.0-IV3.
# Step 3: avviare il broker
kafka-server-start.sh /etc/kafka/server.properties
Important: The Cluster ID is Immutable
Il cluster.id generated when the format is written to the file meta.properties of each node
and in the metadata log. It cannot be changed after initialization. If you lose this file and want to add a node
to the existing cluster, you must use the appropriate bootstrap procedure. Store the cluster ID in a secrets management system.
Docker Compose: KRaft Cluster for Local Development
# docker-compose.yml per cluster Kafka 4.0 KRaft (3 broker)
# Immagine: apache/kafka:4.0.0 (immagine ufficiale Apache, non Confluent)
version: "3.9"
services:
kafka1:
image: apache/kafka:4.0.0
container_name: kafka1
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_LISTENERS: "PLAINTEXT://kafka1:9092,CONTROLLER://kafka1:9093"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka1:9092"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
KAFKA_CONTROLLER_LISTENER_NAMES: "CONTROLLER"
KAFKA_CONTROLLER_QUORUM_VOTERS: "1@kafka1:9093,2@kafka2:9093,3@kafka3:9093"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_DEFAULT_REPLICATION_FACTOR: 3
KAFKA_MIN_INSYNC_REPLICAS: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
volumes:
- kafka1-data:/var/lib/kafka/data
kafka2:
image: apache/kafka:4.0.0
container_name: kafka2
environment:
KAFKA_NODE_ID: 2
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_LISTENERS: "PLAINTEXT://kafka2:9092,CONTROLLER://kafka2:9093"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka2:9092"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
KAFKA_CONTROLLER_LISTENER_NAMES: "CONTROLLER"
KAFKA_CONTROLLER_QUORUM_VOTERS: "1@kafka1:9093,2@kafka2:9093,3@kafka3:9093"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_DEFAULT_REPLICATION_FACTOR: 3
KAFKA_MIN_INSYNC_REPLICAS: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
volumes:
- kafka2-data:/var/lib/kafka/data
kafka3:
image: apache/kafka:4.0.0
container_name: kafka3
environment:
KAFKA_NODE_ID: 3
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_LISTENERS: "PLAINTEXT://kafka3:9092,CONTROLLER://kafka3:9093"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka3:9092"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
KAFKA_CONTROLLER_LISTENER_NAMES: "CONTROLLER"
KAFKA_CONTROLLER_QUORUM_VOTERS: "1@kafka1:9093,2@kafka2:9093,3@kafka3:9093"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_DEFAULT_REPLICATION_FACTOR: 3
KAFKA_MIN_INSYNC_REPLICAS: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
volumes:
- kafka3-data:/var/lib/kafka/data
volumes:
kafka1-data:
kafka2-data:
kafka3-data:
Migration from Kafka 3.x with ZooKeeper to KRaft
If you are managing a Kafka 3.x cluster in ZooKeeper mode and need to migrate to KRaft (required to use Kafka 4.0), the process is called KRaft migration and is officially supported since version 3.5. The good news: Migration happens without downtime for producers and consumers.
Phases of Migration
The official process is divided into 6 phases:
-
Check prerequisites: upgrade to Kafka 3.7 (latest version with ZooKeeper+KRaft dual-write support),
check that all brokers have
metadata.versionaligned. - KRaft controller deployment: Start KRaft controller nodes (3 new nodes, or existing brokers with additional role). Controllers obtain initial metadata from ZooKeeper via the migration tool.
- Dual-write mode: Brokers write metadata to both ZooKeeper and the KRaft metadata log. During this phase the system is fully operational.
- Migration completed: all brokers migrate, ZooKeeper becomes read-only for Kafka. Producers and consumers do not perceive any interruption.
- ZooKeeper finalizer: Run the finalizer that cleans Kafka metadata from ZooKeeper.
- Shutdown ZooKeeper: Decommission the ZooKeeper ensemble. Fully KRaft cluster.
# Step 1: Verifica metadata.version attuale del cluster
# (da eseguire con Kafka 3.7)
kafka-features.sh --bootstrap-server kafka1:9092 describe
# Output:
# Feature: metadata.version
# SupportedMinVersion: 3.0-IV1
# SupportedMaxVersion: 3.7-IV4
# FinalizedVersion: 3.7-IV4
# Step 2: Avvia i controller KRaft con la migration config speciale
# In server.properties dei controller KRaft:
process.roles=controller
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181 # ancora necessario in fase di migrazione
controller.quorum.voters=10@kc1:9093,11@kc2:9093,12@kc3:9093
# Step 3: Avvia la migration (da eseguire una volta soli i controller KRaft sono up)
# Modifica server.properties di OGNI broker Kafka esistente:
# Aggiunge il parametro:
zookeeper.metadata.migration.enable=true
controller.quorum.voters=10@kc1:9093,11@kc2:9093,12@kc3:9093
# Riavvia i broker uno alla volta (rolling restart, zero downtime)
# I broker entrano in migration mode automaticamente
# Step 4: Monitora lo stato della migrazione
kafka-metadata-shell.sh \
--snapshot /var/lib/kafka/data/__cluster_metadata-0/00000000000000000000.snapshot
# Step 5: Finalizza (dopo che tutti i broker sono migrati)
kafka-features.sh --bootstrap-server kafka1:9092 upgrade \
--metadata 3.7-IV4 # o la versione target
# Step 6: Rimuovi zookeeper.connect dai server.properties e riavvia i broker
Important Notices for Migration
- Don't go back easily: Once the KRaft migration is complete and ZooKeeper is removed, rollback is very complex. Migrate first to a staging environment identical to production.
- ACLs and configurations: ACLs and dynamic configurations managed via ZooKeeper are migrated automatically in the metadata log, but check that they are present after migration.
- Connector Kafka Connect: Connectors that use the Kafka cluster as a backend for state (group.id, offsets) continue to work unchanged.
- MirrorMaker 2: If you use MM2 for geo-replication, update remote clusters in the same maintenance window to avoid version incompatibilities.
KRaft with Advanced Configuration: Dedicated Controllers
For clusters with high throughput or managing a large number of partitions (>50,000), it is advisable to separate controllers from brokers (dedicated controllers). Like this metadata operations (create topic, leader election, config change) do not compete with partition log I/O on the same disks.
# server.properties per un CONTROLLER DEDICATO (non gestisce partizioni utente)
node.id=10
process.roles=controller
controller.quorum.voters=10@kc1:9093,11@kc2:9093,12@kc3:9093
listeners=CONTROLLER://kc1:9093
listener.security.protocol.map=CONTROLLER:PLAINTEXT
controller.listener.names=CONTROLLER
log.dirs=/var/lib/kafka/metadata
# server.properties per un BROKER PURO (non è controller)
node.id=1
process.roles=broker
controller.quorum.voters=10@kc1:9093,11@kc2:9093,12@kc3:9093
listeners=KAFKA://kafka1:9092
advertised.listeners=KAFKA://kafka1:9092
listener.security.protocol.map=KAFKA:PLAINTEXT
inter.broker.listener.name=KAFKA
controller.listener.names=CONTROLLER
log.dirs=/var/lib/kafka/data
# Con questa configurazione:
# - 3 macchine controller dedicati (leggeri, poca RAM, poca CPU)
# - N broker puri (ottimizzati per I/O disco)
# - Nessuna competizione di risorse tra metadata ops e I/O partizioni
In Confluent Cloud and in managed environments such as Amazon MSK (which has adopted KRaft since version 3.6), the controller/broker separation occurs automatically and is transparent to the user.
Operational Benefits of KRaft
Faster Startup and Recovery
With ZooKeeper, when the Kafka broker controller restarted, it had to read the entire state of the cluster from ZooKeeper before being able to operate. For clusters with 100,000+ partitions, this could take 30-90 seconds controller unavailability.
With KRaft, the leader controller keeps the metadata log already in memory and on local disk. A failover of the controller typically requires less than 5 seconds, even for large clusters. A case study of a fintech company (Confluent Engineering Blog, 2025) documents a 40% reduction in setup time after migrating to KRaft.
Metadata Scalability
ZooKeeper had a practical limit of around 200,000 partitions per cluster (regardless of performance of metadata operations degraded significantly). KRaft handles the metadata log like normal Kafka logs with compaction, and has been tested with millions of partitions per cluster.
Operational Simplicity
Removing ZooKeeper means:
- One system to monitor instead of two
- One upgrade cycle instead of two (often ZooKeeper and Kafka had complex version constraints)
- Easier deployment on Kubernetes (less StatefulSet, less PVC)
- Easier disaster recovery (cluster state is in the metadata log, not distributed between Kafka and ZooKeeper)
KRaft on Kubernetes with Strimzi
Strimzi is the most popular Kubernetes operator for managing Kafka. From version 0.38, Strimzi natively supports KRaft:
# Kafka cluster KRaft con Strimzi Operator (Kubernetes)
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
namespace: kafka
annotations:
# Abilita KRaft mode (richiede Strimzi 0.38+)
strimzi.io/kraft: enabled
spec:
kafka:
version: 4.0.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
# KRaft-specific
default.replication.factor: 3
min.insync.replicas: 2
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
# Retention
log.retention.hours: 168
log.segment.bytes: 1073741824
storage:
type: persistent-claim
size: 100Gi
class: fast-ssd
# Controller separato (produzione: controller dedicati)
# Ometti questa sezione per controller combinati (default)
# entityOperator gestisce topic e utenti tramite CRD
entityOperator:
topicOperator: {}
userOperator: {}
# Creare un topic con Strimzi CRD (invece di kafka-topics.sh)
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: ordini-effettuati
namespace: kafka
labels:
strimzi.io/cluster: my-cluster
spec:
partitions: 6
replicas: 3
config:
retention.ms: "604800000"
min.insync.replicas: "2"
compression.type: snappy
Checking the status of the KRaft Cluster
# Verificare chi è il controller leader attuale
kafka-metadata-quorum.sh \
--bootstrap-server kafka1:9092 \
describe --status
# Output:
# ClusterId: MkU3OEVBNTcwNTJENDM2Qk
# LeaderId: 1
# LeaderEpoch: 42
# HighWatermark: 156789
# MaxFollowerLag: 0
# MaxFollowerLagTimeMs: 12
# CurrentVoters: [{"nodeId":1,"logEndOffset":156789,"lag":0},
# {"nodeId":2,"logEndOffset":156789,"lag":0},
# {"nodeId":3,"logEndOffset":156789,"lag":0}]
# CurrentObservers: []
# Verificare i dettagli del quorum
kafka-metadata-quorum.sh \
--bootstrap-server kafka1:9092 \
describe --replication
# Leggere il metadata log (per debugging)
kafka-dump-log.sh \
--files /var/lib/kafka/data/__cluster_metadata-0/00000000000000000000.log \
--cluster-metadata
Configuration Differences: ZooKeeper vs KRaft
For those coming from a ZooKeeper cluster, here are the main configuration differences to know:
| Configuration | ZooKeeperMode | KRaftMode |
|---|---|---|
| Cluster connection | zookeeper.connect |
controller.quorum.voters |
| Node ID | broker.id |
node.id |
| Roles | Always broker | process.roles |
| Listener controllers | N/A | controller.listener.names |
| Initialization | Car (ZK handles) | kafka-storage.sh format |
| ACL storage | ZooKeeper znodes | Metadata log |
Metadata Version and Feature Flags in KRaft
With KRaft, Kafka introduces the concept of metadata.version: A version of the metadata format in the cluster. This allows rolling upgrades of a cluster without downtime, one node at a time. The metadata version is updated only when all brokers in the cluster support the new version.
# Verificare la metadata.version corrente e le versioni supportate
kafka-features.sh \
--bootstrap-server kafka1:9092 \
describe
# Output tipico con Kafka 4.0:
# Feature: metadata.version
# SupportedMinVersion: 3.0-IV1
# SupportedMaxVersion: 4.0-IV3
# FinalizedVersion: 4.0-IV3
# Verificare tutti i feature flags disponibili
kafka-features.sh \
--bootstrap-server kafka1:9092 \
describe --all
# Aggiornare la metadata.version dopo un upgrade di cluster
# (eseguire DOPO che tutti i broker sono stati aggiornati alla nuova versione)
kafka-features.sh \
--bootstrap-server kafka1:9092 \
upgrade --metadata 4.0-IV3
The version 4.0-IV3 (Kafka 4.0 Incremental Version 3) is the latest available in the release
Kafka 4.0 March 2025. Each version increase enables new features and protocol optimizations.
Troubleshooting KRaft: Common Problems
The Cluster Does Not Start: “No voters found in quorum”
This error indicates that controller nodes cannot find other quorum voters. Common causes:
-
misconfigured controller.quorum.voters: Verify that the format is correct
(
nodeId@hostname:port) and that hostnames are resolvable by all nodes. - CONTROLLER listener unreachable: Verify that the firewall allows communication on the controller listener port (default: 9093) between controller nodes.
-
Cluster ID mismatch: if you restarted with
kafka-storage.sh formaton one of the nodes without using the correct cluster ID, the nodes will not join the cluster.
# Verificare il cluster ID su ogni nodo
cat /var/lib/kafka/data/meta.properties
# node.id=1
# version=1
# cluster.id=MkU3OEVBNTcwNTJENDM2Qk <-- deve essere identico su tutti i nodi
# Verificare che il controller leader sia eletto
kafka-metadata-quorum.sh \
--bootstrap-server kafka1:9092 \
describe --status | grep LeaderId
# Se LeaderId=-1, nessun leader è stato eletto (quorum non raggiunto)
# Controllare i log del broker per errori KRaft
grep -E "WARN|ERROR" /var/log/kafka/kafka.log | grep -i "kraft\|quorum\|controller"
Broker Not Added to the Cluster
When you add a new broker to an existing KRaft cluster, the broker must be formatted with the same cluster ID as the existing cluster:
# Recupera il cluster ID dal cluster esistente
CLUSTER_ID=$(kafka-metadata-quorum.sh \
--bootstrap-server kafka1:9092 \
describe --status | grep ClusterId | awk '{print $2}')
echo "Cluster ID: $CLUSTER_ID"
# Formatta il nuovo broker con lo stesso cluster ID
kafka-storage.sh format \
--config /etc/kafka/server.properties \
--cluster-id "$CLUSTER_ID"
# Avvia il nuovo broker
kafka-server-start.sh /etc/kafka/server.properties
# Verifica che il nuovo broker sia visibile nel cluster
kafka-broker-api-versions.sh \
--bootstrap-server kafka1:9092 | grep "id:"
Next Steps in the Series
With KRaft included, you are ready to tackle more advanced aspects of Kafka configuration:
-
Article 3 – Advanced Producer and Consumer: the detailed configuration of
acks,idempotent producer, and retry strategies to ensure durability without duplicates. - Article 4 – Exactly-Once Semantics: Kafka transactions for atomic writes on multiple topics, with the new transaction coordinator implemented in the KRaft metadata log.
- Article 11 – Kafka in Production: KRaft cluster sizing, configuration of controller replicas, disaster recovery and metadata log backup.
Link with Other Series
- Advanced Kubernetes: deployment of Kafka on Kubernetes with Strimzi operator, persistent storage management and consumer group autoscaling.
-
Observability: KRaft quorum monitoring with JMX Exporter, critical metrics
how
kafka.controller:type=KafkaController,name=ActiveControllerCountand alert on leader election.







