Kafka Community Spotlight #1
1. Personal
Please tell us about yourself, and where you are from.
I grew up on a dairy and crop farm in rural Minnesota (USA) and have lived in the Twin Cities. For 7 years I lived in another “Twin Cities” Urbana-Champaign, IL where I got my Masters in Computer Science and started my first job. I met my wife in Illinois, married in Illinois, and we moved (me back) to Minnesota shortly after my first child was born. I joined a consulting company and now co-own a consulting company, Kinetic Edge.
How do you spend your free time? What are your hobbies?
My favorite hobby is running and my favorite sport to watch is football (soccer). I ran 3 marathons, but the last one was 9 years ago, and nowadays it’s just short to medium runs.
Any Social Media channels of yours we should be aware of?
I am mostly active on LinkedIn. I also do a fair amount of sharing on GitHub (also nbuesing), but content is part of my company’s GitHub.
What does your ideal weekend look like?
Watching Arsenal, Nottingham Forest, and Wrexham games, while my wife watches Tottenham and Brighton & Hove Albion (living in a world of streaming makes this possible). The beauty of watching UK football, you can watch a match when you wake up on Saturday or Sunday.
Last book you read? Or a book you want to recommend to readers?
Well, I wouldn’t be me without recommending a Kafka Streams book, which is Bill Bejeck’s “Kafka Streams in Action”, 2nd Edition. Now, a book I’m reading for fun (although I need to get back to it) is “The Thursday Murder Club”, by Richard Osman. Over the past year I have been hooked on a gameshow from the UK called “Pointless” where Richard Osman was one of the hosts, and was fascinated to learn he is a writer. He is a funny guy with interesting stories.
Best type of music, best song?
I like to stay current, so I listen to new music most my life. A kid of the 80’s, does mean I did grow up in best era of music of all time. One of my favorite albums to tell people to listen to as a complete album is “Sam’s Town” by The Killers. My favorite Album, I typically reference is “Our Time in Eden” by 10,000 Maniacs.
Favorite food? Best cuisine you’d recommend?
Sushi - I highly recommend it, but it seems like it is the one food you really cannot recommend and change someone’s mind.
What is the best advice you ever got?
Don’t tell people what they need to know or how to do things; present to them what works for you and what helps you. Let them decide if/how they incorporate that into their life and routine.
This was not advice I got in regards to programming. It was from a friend talking about how people should help people in life. We need to be a society that wants to help, not judge them. What to improve peoples lives, by giving them what works for you, not telling them what they should do. I believe this mindset works in development as well. I’ve been told “I’ve never seen anyone more passionate about a technology than Neil is about Kafka Streams” – I try to show them my passion and why I’m passionate about it and then let them decide.
Have you studied at a university? If yes, was it related to computers? Did your study help you with your current job?
My undergraduate degrees are Bachelor of Arts of Computer Science and Mathematics at University of Minnesota, Morris (a great Liberal Arts college with no graduate program, so was always taught by a professor). And then I went to University of Illinois, Urbana-Champagn for my Masters in Computer Science. I have known what I wanted to do since I was 6 years old, and it hasn’t wavered.
I did an undergraduate research project at Morris on Neural Networks (1990-91, using it to predict stock markets) – if only I had stuck with all of that now. It was such a great undergraduate program where I got to know professors well and work with them on research.
2. Kafka
How did you get into Kafka?
Working for a consulting company, you tend to go where the industry is going. Around 2017 I was fortunate enough to take over the lead of the real-time data practice and really was hooked on it ever since.
What Version of Kafka did you start with?
My involvement started in the late 0.11 era but mostly in the 1.0 days when it came to working with customers and their Kafka deployments.
When do you think one ought to use Kafka?
This is one of those things that has a lot of nuances to it. I think using and development with Kafka (or similar technologies) bring a great perspective to developers in how they go about developing software. So I think exploring/trying out event-driven technologies is very helpful.
Now that being said, there are many companies that dive into using Kafka way too early or when it’s not necessary. The statement that comes up many times is “Just use Postgres” and there is a lot of truth to that. However, most enterprise companies (and where we do most of our consulting work) have done and tried that, and they have reached their limits with non-event driven systems - and they are the ones that need to use a technology like Apache Kafka.
Do you think Kafka has a high entry barrier?
I don’t know if there is a high barrier, but there is a journey that is necessary for everyone. The subtle development mindset going from command-driven systems “tell me what to do” vs event driven systems “tell me what you did” is quite important. I don’t think you can rush that. I think you can get into writing Kafka producers and consumers in no time, but that really is just the start.
So while I do not think of Kafka as a high entry barrier, it isn’t a quick entry barrier either.
What’s the most annoying thing in Kafka you can think of?
My frustration isn’t with Apache Kafka itself, but with the desire to hide kafka through abstractions. I believe frameworks that generalize Kafka into non-event-log abstractions (like queue patterns) confuse developers and lead to non-performant applications.
If you had a magic wand and could instantly contribute/fix one thing to Kafka, what would it be?
I wish Apache Kafka Connect’s Struct API was more complete with physical types for decimals, datetime, timestamps, and more. The missing physical types for decimals, datetime, timestamps, etc has lead to countless hours of frustrations with connect integrations.
In addition, when Confluent brought in Avro for Schema Registry, that was prior to Avro 1.8 (which added logical types), so Confluent had to add in their own logical type constructions that are not the same as Avro – and they had to then be merged.
How has Kafka changed over the years from your point of view?
I just find the journey of Kafka fascinating. Things like adding replication, adding security, adding exactly-once-semantics, removal of zookeeper, and so much more… Its journey from a project to solve a problem to become an enterprise grade option is fascinating.
The engineers and management achieved this while remaining forward and backwards compatible – wow. The dedication, attention to detail, I am truly amazed.
What have you built around Kafka? There are a few projects of yours over on https://github.com/kineticedge.
I have built many projects over the years around my desire to teach people about Apache Kafka and Kafka Streams. These projects, as you mention, are at https://github.com/kineticedge. The longest and most active is the ‘kafka-streams-dashboards’ where I have every dashboard metric I can think of for kafka-streams along with dozens of grafana/prometheus dashboards for brokers, clients, and more. I even showcase KIP-714 (client metrics sent to broker). It continues to grow.
I recently started koffset as my way to get consumer lag metrics into dashboards. It started as an exploration to see what I could do around some projects that are no longer active, and it isn’t a lot of code to where I feel I will be able to continue to support it well. I am also exploring the use of GraalVM for it, to keep the distribution even smaller.
Have you considered contributing ktools directly to the project?
For those that know me, I’m all about the CLI. Hence, ‘ktools’ is my attempt to provide CLI features that I find missing in Kafka. Yes, this would be a perfect area to contribute back to Apache Kafka, and I should consider writing a KIP for each of them and see if there is a desire by the community to bring them in. You want to contribute something you believe is of value and would be easy to maintain - so starting it as a separate project makes sense to see “what sticks”, but if others see value, this is a great idea.
3. Kafka Streams
How applicable do you think Kafka Streams, and streaming frameworks are, in general? (say, versus a regular producer/consumer client)
I think they are very applicable, but very different. I have seen people look at Kafka Streams as a more robust consumer, and that is 100% the wrong approach.
If you need stateful processing you build your own, leverage a stateful compute engine (flink) or leverage a library, such as Kafka Streams. However, if you are integrating with an external system, Kafka Streams is not the right technology.
What’s the most complex stream processor you’ve built?
There are a couple. Change Data Capture (CDC) is a powerful tool. It is a great way to get at a legacy system, when you have no other path to do so. However, it is a painful process to build a stream processor around, since you have to replicate all of the business logic in the legacy system around those tables, and you have to ensure that events are around to be able to join that data together. The key to any project like this is a well-defined canonical model for the end-state data you are creating. This is where leveraging schema registry is quite important to ensure the data contracts of your canonical model and the raw CDC data do not diverge.
The other system is a high-throughput yet low latency data-enrichment system. It is a highly-orchestrated microservice system. It leverages Kafka Streams for stateful processing, but then consumers for integration into storage systems
Most of my stories are really my client’s stories to tell. I’m fortunate enough to be able to work in so many verticals to where we at Kinetic Edge are in position to help many with their real-time streaming needs. For me personally, it is very rewarding.
What do you think about Apache Flink?
I think Flink is a great product, and the tool of choice to build SAAS based services around event streaming. We recommend and use it where needed, and I have done demos and presentations on it as well. For clients that are self-hosted or are already well into their journey of microservice based deployments, writing applications using the clients library, or stateful applications using kafka-streams gives the team more flexibility into integration into the other components of their microservice deployments.
Do you have opinions on alternative stream processors? Have you looked into any?
I have used Apache Beam really enjoy it. It is a great option when building a system for 2-phase processing. 1) real-time data for observability 2) reprocessing with late arriving data for billing and accounting. It gives a single pipeline build that can pull in data from real-time sources for that “I need it now” observability, but then have the object-store reprocessing for accurate accounting reasons. This is perfect for ad-click based advertising and use-cases that need that.
I think any stream processor can be a powerful tool. I promote Kafka Streams because I know it well, and I know when not to use it. I am able to get rather complex systems built in a short period of time using it, but I think the biggest benefit is re-thinking problem solving in real-time data joining for immediate processing vs delayed query based inspection of that data. Any stream processing can provide engineers that way of thinking and the tools to help them.
When do you think stream processing is overkill?
To use a current buzz word “Shift Left” is only meaningful if there is business value in shifting something left. Sometimes just pushing your data into a database and doing your analysis on that data is just fine.
To be honest many companies’ first project with Kafka is overkill; but you have to start somewhere. But what any company should do after that “getting their feet wet” moment is to figure out how Kafka could fit into their ecosystem, and reevaluate if it is necessary.
That being said there is a lot of power someone can do with a simple Kafka Streams application. If a company is already doing microservices, already using Kafka, and develops in the JVM ecosystem, adding Kafka Streams is the easiest way to bring some stream processing into your applications, in my opinion. That has all due with the architecture and development done by the Kafka Streams committers. Just avoid “if we use Kafka Streams, then everything must now be Kafka Streams” mindset. Also, avoid complexity, you need to know what Kafka Streams is doing or you will make a system difficult to maintain.
4. General/Parting
How many Kafka Summits have you been to? How has the conference changed over the years?
I have been to Kafka Summit/Current every year since 2018. I have presented many times since 2019.
The conference has changed from engineers wanting to learn Kafka and figure out how to leverage it, to where the focus is more on solutions solving.
What do you think about queues?
Prior to KIP-932 - Queue for Kafka, any comparison of Kafka to queues would frustrate me. I look at KIP-932 as a way to say - when you need queues, such as for a command pattern, you can now do it in Kafka with confidence. However, it is still best to think of Kafka as a log in which you are writing immutable events, since you are reporting on what “has” been observed.
How do you see the future of Kafka usage and development, 5 years out?
I am not convinced that the stream processing can be abstracted into a SQL syntax, this leaving to more ways to simplify writing stateful processing applications will continue. I think the use of AI in development will bring more developers into the stream processing and the need of the SQL abstraction is of less importance.
Companies that find a way to provide SAAS based solutions beyond SQL will win out.
Now stream pipelines that can leverage AI models quickly and effectively will also be well funded; and I’m sure they will lead to some exciting innovation.
Do you think we’ve innovated in the messaging space in the last 10 years? How have you seen the space change?
Kafka allowed for a better decoupling of services, well beyond what a queue could do. Also, it opened up means of self-service use-cases emerging within an organization leading to more innovation and development. It is the next logical step of design since RESTful APIs.