Enterprises by and large know that their use – and understanding – of data should play a central role in how they operate. We’ve heard the saying time and time again that ‘data is the new oil of the 21st Century’. But too often organizations rely on historical data that can paint an inaccurate picture – a picture at a moment in time, rather than in real time – and they struggle with the scale and complexity of becoming a ‘data driven organization’.
Confluent, a vendor that is born out of the open-source Apache Kafka movement, is attempting to solve some of these problems by shifting organizations’ thinking towards ‘data in motion’. Confluent would argue that data streaming needs to be the default mode of data operations for modern businesses, where traditional batch processing can no longer keep pace with the number of use cases that rely on real-time updates, across an expansive set of data sources.
Confluent has seen success in this area and has a growing list of enterprise customers that are organizing their business around this concept of data streams and data in motion. However, it would be fair to say that there is still a level of complexity that is inherent in building out an organization that operates using real-time data streams.
Confluent has solved some of this already, particularly with regards to its Confluent Cloud offering and how it has sought to improve manageability for customers. However, at its user event this week in Austin, Texas, the company took this even further with the release of Stream Designer, a visual interface that allows developers to build and deploy streaming data pipelines in a “matter of minutes”.
Not only this, but it provides Confluent customers with the opportunity to see business users and technical developers work more collaboratively and closely around data-driven use cases.
diginomica got the opportunity to speak with Chad Verbowski, Confluent’s Senior VP of Engineering, ahead of the announcement, where he explained why the use of Stream Designer is allowing businesses to expand their user profile when working with Kafka – in an attempt to democratize data. Verbowski said:
We are actually going up the pyramid. We hope that everybody will be able to do this someday. The next step ‘up’ from a developer is somebody who is really familiar with SQL, because with the SQL language you are expressing more declaratively ‘what I want to happen’, not necessarily all the syntax of how it needs to happen – threads, machines, nodes, etc.
So what we are announcing here is two things. One is a visual layer, which we think business users could do. But if you really want to, and I think this is really important, you can expand and write the SQL part and it will integrate in and become a visual element as well. So that people that are more technical can easily extend. And I think that ability for people to collaborate – those that are less technical doing the breadth – and then pulling in some help where they can go deeper, is super valuable.
I think that the second piece is that working with streams is actually less friction than working with the data warehouse. Imagine that as you collect your data, you’ve curated it, you’ve inserted it into a database or a data warehouse, it fits a nice schema, things look beautiful – but the problem is that it takes a lot of people to make that happen.
One of the big friction points is just inserting unstructured data into a structured system. The beauty of Kafka is that it takes unstructured data, we allow you to apply that schema to it, and then when it doesn’t fit we give you the tools like Stream Designer to fix it on the fly and work with it.
Looking at the announcement, Confluent said that there are three core components to Stream Designer. These include:
Developer productivity – instead of spending days or months managing individual components on open source Kafka, Confluent says that developers can now build pipelines with the complete Kafka ecosystem accessible in one visual interface. The company added that they can build, iterate and test before deploying into production in a modular fashion, keeping with popular agile development methodologies. Users now don’t have to work across multiple discrete components, like Kafka Streams and Kafka Connect, that each require their own boilerplate code.
A unified end-to-end view – after building a pipeline, the next challenge is maintaining and updating it over its lifecycle as business requirements change and tech stacks evolve. Confluent says that Stream Designer provides a unified, end-to-end view to observe, edit, and manage pipelines and keep them up to date.
Accelerate development of real-time applications – pipelines built on Stream Designer can be exported as SQL source code for sharing with other teams, deploying to another environment, or fitting into existing CI/CD workflows. Stream Designer allows multiple users to edit and work on the same pipeline live, enabling seamless collaboration and knowledge transfer.
Speaking to the benefits of working with Stream Designer, Verbowski said:
One large company we are working with had a need to recommend other people they may know on their platform as new users signed up. And what they were doing is that they used Kafka to collect all the information from their clients, they’d send it all off to their data warehouse, they’d run their machine learning job and recommend those connections. But that would take about a day. Because they uploaded their data by 10am – and it just wasn’t that impactful to recommend something a day later. You’ve got to get them to re-engage.
So what they did is that they extended Kafka. The same data that’s going to the data warehouse, they just added that logic directly to those streams as a new user event was coming in. They joined against the historical data and they were able to turn around that recommendation within seconds and thus really drive that business value.
I think that’s the core of what we’re announcing here with Stream Designer – you can build some of these things directly on the stream that you already have in cases where real-time is valuable.
And of course we think everything should be real-time.
In addition to Stream Designer, which is aimed at making working with data streams easier, Confluent also announced a new tier of capabilities for its Stream Governance product, which was announced earlier this year and is pitched as a full managed governance suite for Apache Kafka. The idea being that whilst working with data needs to be made easier, governance and security – essentially understanding how that data is being used – is still critical.
The key capabilities announced include:
Point-in-time playbacks for Stream Lineage – Confluent says that customers can now understand where, when, and how data streams have changed over time. Point-in-time lineage provides a look back into a data stream’s history over a 24-hour period or within any one-hour window over a seven-day range. Teams can now see what happened on, for example, Thursday at 5pm, when support tickets started coming in. Paired with the new ability to search across lineage graphs for specific objects such as client IDs or topics, point-in-time lineage aims to help identify and resolve issues in order to keep mission-critical services up for customers and new projects on track for deployment.
Business metadata for Stream Catalog – users can now also build more contextual, detail-rich catalogs of data streams. Alongside previously available tagging of objects, business metadata gives individual customers the ability to add custom, open-form details represented as key-value pairs to entities they create such as topics. Confluent says that these details, from users who know the platform best, are critical to enabling self-service access to data for the larger organization. While tagging has allowed users to flag a topic as “sensitive,” business metadata allows that user to add more context, such as which team owns the topic, how it is being used, who to contact with questions about the data, or any other details necessary.
Globally available Schema Registry for Stream Quality – Confluent has also more than doubled the global availability of Schema Registry to 28 regions, meaning teams should have more flexibility to manage schemas directly alongside their Kafka clusters in order to maintain compliance requirements and data sovereignty.
In terms of how these features will benefit users, Verbowski said:
We are giving you a catalog where you can annotate and label things in a way that lets you come up to speed with that context that you need to be able to write those queries. So then you don’t have to go through that whole process of cleaning it to get it done, but doing it earlier in the cycle is really helpful.
Also, this business metadata that we are talking about in the catalog – it’s one thing to know that you’ve got a table and it has fields X, Y and Z. It’s another thing to know what X, Y and Z mean. Or, who owns that table, or where do they come from, or should I trust this, or was it cleaned up in another version. That kind of context makes it easy to discover and makes it easy to understand what’s going on without having to ask your friend.
Verbowski also pointed to the use of Stream Lineage to help organizations to solve data problems in real-time. He said:
With Stream Designer, what we are doing is building a pipeline, so that you can see all of the data that you’ve produced. So at any time you’re working with some data, visually you can drill in and see all of the things that produced it.
For example, you’re working with some data, you see the number 5 when you expected the number 7 – well, how did that happen? Was it the processing? Was it the raw source? When you look at other products, especially other pipeline products, they will talk about what I call the ‘happy path’ – when everything works it’s great, but quite often when you build things there’s a subtle mistake here and there.
The real advantage of this is being able to use that lineage to go back and iterate and produce the right result. And I think that’s pretty powerful.
Confluent – and organizations – know the benefit of being able to work with data in real-time. However, they also understand that working with Apache Kafka does require an extensive rethink in operating models and an investment in skills to manage the tooling. What it’s announcing this week seeks to make this transition easier, making the usability more intuitive, useful and productive – whilst also extending tooling out to new user profiles. This will be welcome news for customers. In addition, it also understands that if more users are going to be interacting with data in real-time, then governance features are key. Getting control over your data streams early on in the process could save a lot of headaches down the line.
We will be on the ground at the event this week, where we will be picking up the latest news and talking to customers about their use of Confluent. More to come on diginomica.