Migrating 2000 microservices to Multi Cluster Managed Kafka with 0 Downtime

Build & Deploy
Voting no longer possible
Voting enabled when talk has started

How do you migrate your Kafka cluster while still allowing full user traffic?

Recently, Wix migrated its self-hosted, 60B events per day Kafka clusters, to a managed Kafka platform.

The classic approach would be to perform this transition when all incoming traffic is removed from the data center.

But draining an entire data-center for an undetermined period of time, until all 2000 services complete the switch was too risky for us.

This talk is about how we gradually migrated all of our Kafka consumers and producers with 0 downtime while they continued to handle regular traffic. You will learn practical steps you can take to greatly reduce the risks and speed up the migration timeline.

Natan Silnitsky

Natan Silnitsky is a backend-infra team lead

He leads the Data streaming team in charge of building event driven libraries and tools on top of Kafka.

Before that he was part of a task force that was responsible for building the next generation CI system at Wix on top of Google's Bazel build tool.

Has many years of experience as a developer of large scale web services - First in .Net, later in Scala.

Natan's passions include clean and functional code, dev velocity and great software design.