Haven’t we all dreamed, people who had to work with Apache Zookeeper and Kafka that one day Kafka would run without the Zookeeper? Don’t get me wrong, Zookeeper is a fantastic piece of tech, but It has been getting rusty with the passing of time and it could cause trouble.
With the introduction of KIP-500, Apache Kafka started its way into running without Zookeeper.
Where are we coming from
In a world with Apache Kafka and Zookeeper, a typical small deployment might look as you can see in figure 1.
Keep in mind this is a very simplified deployment, where only three Kafka and three Zookeeper nodes were used, If you are following good practices for your production deployments I’m sure you are using at least five Zookeepers.
An experienced reader can already see a bunch of non-trivial problems that could happen, what about
The latency introduced in communicating Broker nodes and Zookeepers?
Deployments now have two different components to take care of, each one with different characteristics. What can go wrong?
….
So what is KIP-500 trying to bring? In a nutshell, to substitute Apache Zookeeper with an internal process, run separately from the Brokers, responsible for managing the cluster metadata, using Raft and Kafka topics itself.
Where is Apache Kafka going
How is the future of Apache Kafka looking? In figure 2 and figure 3, the reader can see a glimpse of how the new setups will look like.
Note, even though we have represented three nodes each, nothing prevents you from having a single node deployed that takes both roles, brokers (data) and controller (raft).
With the introduction of KRaft, Apache Kafka gets a MetadataAPI, that will be responsible for managing metadata gathering for brokers and clients. New Controller APIs (KIP-631) are introduced as well.
All of this is certainly wonderful, right? But, what is the timeline for the change to be a reality? In a nutshell, how long is still ZK still going to be available in the AK artefacts?
Recently KIP-833, was introduced, there the timeline is made clear. Assuming the cadence of time-based releases every 4th month
2022/08: KRaft mode declared production-ready in Kafka 3.3
2022/01: Upgrade from ZK mode supported in Kafka 3.4 as early access.
2023/04: Kafka 3.5 released with both KRaft and ZK support. Upgrade from ZK goes production. ZooKeeper mode deprecated.
2023: Additional 3.x releases
2024/xx: Kafka 4.0 released with only KRaft mode supported.
So there is still 1 more year before KRaft becomes the only supported method for coordination in Apache Kafka.
A takeover plan
Disclaimer, the comments here do not represent official plans or guidance by my actual employer, they are my personal thoughts and how I would embrace this story based on my background and experiences in the past.
As noted in the previous list, the first release of KRaft has introduced already in August 2022, it would be around two years in the baking before being considered the only supported, however common sense and past experiences suggest caution.
First and foremost, if you are running an in-house Apache Kafka deployment, you should be making yourself, and your team familiar with the upcoming change. Having an answer for
The migration plan? How, and most importantly, when, do you plan to execute the migration?
Learning, learning and practice, more practice. Your team should become familiar with the new monitoring metrics and the operational situations that might happen with the change.
Thanks, lord, this change will be transparent for the clients, so initially, we should not worry much about them. However, depending on your current setup and your migration plan, you might have to think about how this might impact them.
But, when, tell me when should I be doing that? If I were you I would
Start migrating first less-important systems, introducing the changes in small bites that allow your team to move from learning into practice.
For sure, the introduction of a new coordination algorithm, although baked for a long time, it is a scary and vital thing. Who does not remember split brains in Elastic in the early/and not-so-early/ days? Link, they were a pain in… So take it easy with moving production, especially if your system has a high impact.
Your organisation, team and product setups will guide you to devise the exact timelines for sure.
If you like to already practice, I would suggest the KRaft playbooks I have built, you can easily start different setups of Kafka with KRaft and experiment. More about that in the future instalments of this series.