Kafka Community Spotlight #7
1. Personal
Please tell us about yourself, and where you are from.
Hi everyone, my name is Nikolay and I come from Bulgaria where I grew up, completed my studies and gained my first professional experience. I am currently working at CERN as Lead Data Platforms Engineer and I joined the organisation 13 years ago.
How do you spend your free time? What are your hobbies?
Most of my free time goes to my family and that is what I enjoy the most outside work. I am also a huge motorsports fan, so I try to make time to watch my favourite F1 and MotoGP races. We like travelling and sometimes we do it while attending some of the races, so it’s quite fun and I enjoy it a lot. I also ride a motorbike myself, but not a fast one unfortunately. During the winter months my favourite places are in the Alps, as I love skiing.
What does your ideal weekend look like?
Well, during the summer it would be spending free time with my family, playing with the kids outside and having an F1 race to watch in the evening to relax even further. During winter I would spend a full day skiing and having snowball fights with the kids.
A book you’d recommend to readers?
This is a good question as I was thinking about it recently because I noticed that I’m reading only technical books and it would be good to spend some time outside the technical domain and start reading more literature. I recently enjoyed one that sits on the borderline, it was not purely technical but still related, leaning more towards design and perspective: “Learning Systems Thinking” by Diana Montalion.
Best type of music, best song?
I like music that mixes techno and classic beats, not sure the exact name of the style. More recently I have been listening to DJs like Kavinsky and Quixotic. I really enjoy the song Outside from Kavinsky and when I need to get some boost of energy: Palms from Quixotic. The second is really a nice beat and you start to feel the summer spirit like you are on the beach while listening to it.
Favorite food? Best cuisine you’d recommend?
I have been living in France and Switzerland for the last 13 years so I really enjoy the food here, but I would still go for traditional Bulgarian food, maybe because I miss it too much. I grew up eating Lyutenitsa, which is a mix of tomato and pepper paste, and I love it, but it is not something you can find easily outside Bulgaria. So yes, I recommend everyone visit Bulgaria to try our traditional food.
What is the best advice you ever got?
It is not a piece of advice directed at me personally, but something we say in Bulgaria: “wear your new clothes”. It is an expression meaning to live your life today and not save it for later, as you never know when later will come, if it ever does.
What is the best advice you would give?
It is related to what I mentioned in my previous answer. Enjoy life today and try to find the right work-life balance. I would also add: set specific mid and long-term goals and focus on achieving them, of course without compromising the life experience, which should always come first.
2. Kafka
How did you get into Kafka?
I started using Kafka for the first time back in 2015 and I have never stopped using it since. It was when we were designing the Next CERN Accelerators Logging Service as a lambda data pipeline.
What version of Kafka did you start with?
I think it was 0.9 if not 0.8, but it was way before version 1.0.
Stan: When Nikolay says “way before”, it’s worth emphasizing that 0.8, 0.9, and 1.0 were each roughly two years apart:
- Kafka 0.8.0: Dec, 2013
- Kafka 0.9.0.0: Nov, 2015
- Kafka 1.0.0: Nov, 2017
When do you think one ought to use Kafka?
Always! I’m joking obviously, but yes: every time we have data moving between components or systems and we want to prevent losing it in case any of them becomes unavailable, we should guarantee the buffer somewhere. Of course it depends on the use case and the requirements whether to choose Kafka or a traditional message queue, but Kafka really shines when you need durable, replayable, high-throughput event streaming with multiple independent consumers, use cases where a traditional queue like RabbitMQ would be over-constrained.
What’s your favorite quality of Kafka and log-based messaging?
The best Kafka quality for me is that it is a reliable component, and that is the most important thing when you run highly available critical systems. Kafka is one of the components in our data pipelines that allows me to sleep soundly knowing that the system is running fine and coping with the high throughput, and even if something goes wrong, I will always be able to recover the data the next day or after the weekend. We have configured 3 days of retention so that even unnoticed weekend incidents allow us to preserve the data until the next working day. Everything else is a matter of technology and requirements that depend on the system and the use cases, but relying on the maturity and stability of the component is the main quality for me.
Do you think Kafka has a high entry barrier?
Not at all. You do not need much time to bring up a simple standalone Kafka broker for a test or development environment, or even Kafka Connect, and start playing with it. The concepts are clear, well explained, documented and straightforward. I believe it is relatively easy for any engineer to start using the Kafka ecosystem. Then, going from getting started to mastering fine tuning, high throughput and deployment does require some specifics and experience, but the entry barrier I find quite low.
What’s the most annoying thing in Kafka you can think of?
Probably the operational sensitivity to broker-level issues, particularly disk I/O saturation or unstable network on a single broker. In a busy cluster, one slow broker can disproportionately impact overall throughput and latency because producers may still be assigned to it as a leader. It requires careful monitoring and tuning to catch these situations before they cascade.
If you had a magic wand and could instantly and frictionlessly contribute/fix one thing to Kafka, what would it be?
Building on what I just mentioned about Kafka’s sensitiveness to unstable broker network connections and high I/O wait. This is understandable and by design, but we have experienced cases where a single broker’s issues affected the entire cluster’s performance, when it would have been better if that broker had been completely excluded from the cluster. So yes, I would love to see some work in this area: being able to identify when a broker is struggling too much and it would be better to exclude it from the cluster rather than continue routing data through it and assigning it as a leader.
Stan: Agreed. I think we built something like this in Confluent Cloud, if I remember correctly.
How has Kafka changed over the years from your point of view? Good and bad.
Kafka has evolved a lot and in particular the tiered storage support and the contributions around having data stored long-term and making it available for analysis through Iceberg are quite interesting. People are already doing this outside Kafka, but it reduces the operational burden and improves the integration. So from that perspective it has evolved beyond the traditional pub/sub system, and I think it is for the better as it makes it a much more flexible and complete system without compromising the fundamentals. The replacement of ZooKeeper with the internal KRaft consensus mechanism is also a step in the right direction in my opinion.
What are your most contrarian opinions on stream processing? What are the most mainstream ones?
I have experience running streaming jobs in operations and I am quite happy when simple enrichment is performed. My experience trying to achieve high-throughput aggregation of data by combining several data streams on the fly and applying more complex windowing operations with Spark showed that running batch jobs at short intervals performed better, though the acceptable latency also allowed it.
That said, I have not tried Flink, and for low-latency stateful processing at scale it is worth evaluating.
3. Business/Work
What does CERN even do? How does data play into the Large Hadron Collider?
CERN is the European Organisation for Nuclear Research, home to the particle accelerators complex with the LHC being the largest machine and the one bringing the most exciting experiments. When the complex is running, petabytes of data are being collected every single day. The challenges around such volumes are significant: first to collect and store it, but also to analyse it afterwards and make it available to scientists all around the world. That is where the WLCG (Worldwide LHC Computing Grid) comes into place, and it is the largest computing grid in the world. So data is really what CERN is about, as the science is based on the experiments’ output and simulations, and without the data infrastructure none of it happens.
What is your day-to-day like as a lead data platform engineer in CERN?
All the projects I have been involved in at CERN have been operations-related rather than physics analysis: designing, developing and operating data pipelines for accelerator device monitoring and observability of the IT Data Centre and WLCG resources and services. Day-to-day this means wearing many hats: architecture and design, hands-on development, operating and troubleshooting production systems, and working closely with stakeholders across CERN to understand their needs and translate them into data solutions. We are a small team so everyone takes ownership of the full lifecycle of a system, from initial proposal through to production and long-term maintenance. On top of that we offer Kafka clusters and streaming expertise as a service to other teams at CERN, so there is also a platform engineering and support dimension to the role.
In your experience, is Kafka used frequently with IoT?
Yes, actually part of our data streaming offering includes integration with IoT devices here at CERN, so in our case we have clients serving IoT data. The data is received from an intermediate MQTT broker, as MQTT is not natively supported in Kafka, but the real-time message decoding and delivery to storage happens through Kafka streaming and connectors. My colleagues presented this part of our infrastructure at Kafka Summit London 2024.
What’s your take on on-prem versus cloud, with relation to Kafka?
I don’t have direct experience with Kafka on cloud as we deploy on-prem in our data centres at CERN, but the first thing that comes to mind when thinking about Kafka in any environment is network. Latency, stability and bandwidth are what Kafka is most sensitive to, and as long as those are under control I would not draw hard borderlines between on-prem and cloud. Cost is the other dimension worth watching closely in cloud deployments given the data volumes Kafka typically handles.
How is on-prem going at CERN?
At CERN we have two data centres that are physically on different sites and can be considered like different regions if we compare to a public cloud: one in Switzerland and one in France, with multiple availability zones per data centre. This allows us to deploy high-availability Kafka clusters, both cross-zone and cross-region. The brokers run on VMs provided by CERN’s OpenStack service, while the other components like Kafka Connect and streaming jobs run on Kubernetes.
Any fun experiences you’d like to share from CERN?
My most memorable moments are when I had the opportunity to visit the LHC tunnel for the first time and stood next to the detectors. Seeing the scale of the machine in person gave me a very clear sense of why the data work we do actually matters. We are not just moving bytes around, we are helping operate this machine and delivering the raw material of some of the most important physics experiments in the world.
4. General/Parting
Do you use AI in your day-to-day? Any favorite tools?
Yes, I have started using it more and more and it is useful for increasing productivity. I personally use an AI assistant outside of work and it seems to be gaining more and more popularity across the IT community. The productivity gain is real but I think the bigger benefit is the ability to quickly get an informed starting point on topics outside your immediate expertise, which matters a lot when you manage a broad stack. I think we are still in the early stages of how deeply it will integrate into engineering workflows.
How many Kafka Summits have you been to? How has the conference changed over the years?
I have attended Kafka Summit London in person once, but I always follow the content online when I cannot be there. From what I have seen over the years, the conference has grown significantly both in scale and in the depth of the topics covered. In the early days it was much more focused on the core broker and basic use cases, whereas now you see talks on tiered storage, Iceberg integration, AI/ML pipelines, and serious production stories from companies operating at very large scale. It reflects well how the ecosystem has matured.
What do you think about queues? Do they have a place in Kafka?
Yes, definitely. Queues and Kafka serve overlapping but not identical use cases: a traditional queue delivers a message once and discards it, while Kafka retains and replays, which is a fundamentally different model. But there are many scenarios where you just need simple point-to-point delivery without the full event streaming model, and it would be great if Kafka could accommodate those natively rather than forcing teams to run a separate broker. On the protocol side, broader support for MQTT and AMQP would also go a long way, especially for IoT use cases which we actually deal with at CERN.
How do you see the future of Kafka usage and development, 5 years out?
It will keep growing. It has been growing for the last 10 years and I see no reason for that to slow down. All of AI is driven by data and the explosion of AI and ML pipelines means more real-time feature engineering, model serving, and data ingestion at scale, all of which play to Kafka’s strengths. And beyond just the demand side, Kafka itself has also started growing beyond core message processing and delivery. In five years I would not be surprised to see Kafka described less as a message broker and more as a general-purpose real-time data platform.
Do you think we’ve innovated in the messaging space in the last 10 years? How have you seen the space change?
The fundamentals have not changed and the core Kafka business is still the same as in version 0.9, and I think that is actually a sign of good design rather than stagnation. What has changed significantly is everything around it, the ecosystem, the tooling, the integrations, and the operational maturity. When I started with Kafka in 2015 you were largely on your own building producers and consumers, and figuring out schema management. Today the ecosystem around Kafka is dramatically richer: Kafka Connect, Schema Registry, tiered storage, KRaft, and the growing Iceberg integration are all making it a much more complete platform. So yes, I would call that innovative. Not a revolution in the fundamentals, but a sustained and meaningful evolution in what you can actually do with it.
Do you see Kafka fading away after 5 years?
Not at all, quite the opposite actually. With the demand for real-time data infrastructure only accelerating, and Kafka continuing to expand beyond its core into a more complete platform, I see its role growing rather than shrinking. If anything, the next five years look more interesting than the last ten.
What other tech besides messaging do you have interest in?
Data transformation and processing is where I spend a lot of time beyond messaging. Apache Spark is my favourite tool there and we have a significant number of Spark jobs running as part of the workloads we manage at CERN. Beyond that, a big part of my work involves managing a multi-backend observability platform spanning several different storage technologies: Grafana Mimir for metrics, OpenSearch for logs, TimescaleDB and InfluxDB for time-series, and HDFS for long term archive and analysis. On top of that I manage a central Grafana instance serving more than 5,000 users, which brings its own interesting challenges around scale, governance and performance.
Anything else you’d like to add?
Big thanks to Stanislav for what you are doing and gathering the Kafka community together! If there is one thing I would say to anyone reading this who is hesitating about adopting Kafka or going deeper with it: just start. The ecosystem is mature, the community is strong, and the problems it solves are only becoming more relevant. I have been relying on Kafka for over ten years and I have never once wished I had chosen something else.
Any Social Media channels of yours we should be aware of?
You can find me on LinkedIn. I am always happy to connect with people from the data and Kafka community.