As the world rushed to hype and hail GPT-4 this week – the latest, multimodal iteration of Open AI’s large-language AI – one company appears to be giving GPT technology a deeper, more useful enterprise spin.
Data analytics specialist ThoughtSpot is aiming to create real enterprise value around the GPT system that, until recently, seemed designed to put creative people out of work.
This week, the 2012-founded company announced that even non-expert employees within client enterprises can now search for deep insights from their data, using natural language prompts. A hoard of insight from a horde of data points, perhaps; the facility is available to preview now.
Amit Prakash is co-founder and CTO of Mountain View, CA-headquartered ThoughtSpot. He tells me:
It’s been a passion of mine to make this a reality. And definitely, the availability of large-language models is helping that vision. So, what we are announcing are new capabilities. One is for users to be able to search for data insights using freeform natural language.
When they search, it searches both the existing analytics catalogue as well as creates a brand-new answer from the data. The idea is that, 90% of the time when someone asks a question, somebody else in the enterprise has previously asked a similar one.
That means they have probably thought through many aspects of it. Not just the wording and what the question means, but also what’s the best way to present it in a chart, and what labels make sense.
So, if we find content like that, which matches the prompt’s intent, the tool will surface it. And, we will interpret the question and turn it into a generated answer.
On the face of it, this would be a useful function for any enterprise that runs on data, but finds much of it sitting inert in databases and data lakes, with little expertise in how to extract value from it. At least, when the analytics team is busy.
Prakash continues:
The second thing we’re adding is AI-generated narratives. So, when you’re looking at the chart, that means helping somebody understand what’s going on. What does the data say? Are there any interesting features in it?
What was the purpose of this added functionality?
Traditionally, when people have done natural language generation, it’s been mechanical and templated: just picking a template and filling in the values. But with large language models, you can generate much more natural language around it.
And the third thing is making our modelling process even better. Modelling is a critical part of what we offer, because when you give people unprecedented access to data, to be able to ask any question, they WILL ask any question.
You want those answers to be well governed, so they’re using the right definition, metrics, and attributes. They’re using the right data set and security is in place. And the right person is able to access the right piece of data, and nothing more.
Accuracy and trust
When Prakash co-founded the company, was it in the belief that these capabilities would come onstream one day in the future, as generative and large language tools evolved? Yes and no, he says:
In the sense that we knew we would crack this problem one way or another with advances.
But the exact form in which large language models have begun showing a level of intelligence, both in terms of understanding real-world nuances in a sentence and their cogeneration ability, that has been a surprise to everyone. Including the people who invented them!
When we started, our vision was that everybody who has anything to do with data should be able to ask their own questions of it. And without requiring another human being in the loop who is, quote unquote, a data expert.
Surely this raises a red flag, however. Especially for those commentators who see the public rush to adopt generative AI as implicitly devaluing human expertise, skills, and experience. Isn’t there a risk that ThoughtSpot is now widening that circle of redundancy and commoditized processes to include data analysts, of the sort who currently come at a premium?
He says:
Data experts should be able to impart their knowledge in the way they model the data. And then end users should be able to ask whatever questions come to mind.
But we realized that accuracy and trust are important in this domain. If we built a system that people couldn’t rely on to get reliable answers, then we had no future. Plus, it would be disastrous for whoever is using it.
So, we always took that as a hard constraint. That when we answer questions, it should be in a way that people’s answers to the first implementation did.
There’s no way we can deal with all the ambiguity and missing information in natural language. But we could bring it as close as possible to natural language, so that business users will have the ability to get the answers they need. But they will have to work through the system.
Even so, might data scientists and analysts see ThoughtSpot as parking its tank on their lawn, just when their skills are most valuable? And doesn’t GPT rely on historic human expertise and skill? So, don’t future risks lurk in devaluing those skills today, leaving more and more people reliant on pushing a button for yesterday’s answers? Prakash said:
I can certainly see that, in different industries, this kind of technology can have a negative impact on professionals. But I’m not worried about that happening in the data space.
If you look at the current state of the industry, there’s still a lot more demand for data professionals, despite the economic downturn, than there is supply. So, most companies are not able to take full advantage of the data they have, because they can’t bring in that data in a shape that the business can benefit from. So, they make decisions based on their gut.
What we’re trying to do is create the conditions where someone says, ‘I can get the answer right away, I don’t need to wait for a week.’ And that makes data a lot more valuable to the company.
Plus, when customers deploy ThoughtSpot, it ends up getting the person in the data team who’s doing all this work promoted. So now they’re more of a strategic adviser to the business.
So, the data needs to be supremely well organized, described, and tagged by data professionals first, before non-experts can question it?
No, and you’ve hit on one of my favourite topics. There’s a misconception that you can only take the cleanest, most well-formed data to start doing analytics on top of it.
One of the best ways to get your data cleaned is, actually, to shine a 10,000-watt spotlight on it. Because then you start discovering the problems that really matter, as opposed to the ones you only think matter. If you have concerns about data quality, one of the best things you can do is put an analytics tool on top of it.
Give people fair warning, but let them play with it.
Worried, but not too worried
So, does Prakash have any concerns at all about the momentum of generative AI – this wave of popular adoption? As previously reported by diginomica, the likes of ChatGPT not only implicitly devalue many human skills and talent, but also risk spreading disinformation – false data that has been given a veneer of AI-generated veracity? He says:
“As a technologist and human being, it definitely concerns me. But as far as this product and my company is concerned, it’s less of a concern.
Let me explain what I mean. There’s the wonderful discovery that we landed on accidentally, that we can train larger and larger transformer neural networks with lots of text data. And they are beginning to show a level of pattern-matching and intelligence that hasn’t been seen before. So, we are excited about this as a broad tool.
But the particular incarnation of this technology that is ChatGPT is also a social phenomenon. All of a sudden, it’s not just engineers and computer scientists who are excited about it, it’s also a pop culture thing; it has this playfulness.
But also, implications for intellectual rights. So, I don’t know where all that will land. But as a society we have to grapple with, and come to terms with, what utility it provides. And what harm it creates, and how we put guardrails around it.
Plus, GPT is not the only game in town. There are seven or eight other players who are building these large foundation models with different applications in mind. But as far as ThoughtSpot is concerned, we’re mostly interested in the cogeneration capability and the common-sense reasoning.
So, how long has ThoughtSpot had sight of what OpenAI has been doing with GPT? And with Microsoft? And, at what point did it become a strategic imperative to partner with them? He says:
We’ve been tracking these developments since 2019.
I wrote a blog last year talking about how AI is not coming for the analyst’s job yet. And I mentioned how GPT falls short in building a solution for this. But over the last four or five months, we started seeing this capability becoming increasingly promising. So, we started accelerating our work.
Until GPT-2, I don’t think we could have built the system we’ve built today. But it really took the accuracy level that we were seeing in GPT-3.5 for us to be super excited about building a product around it.
Does Prakash worry that OpenAI’s big investor and partner, Microsoft, might look at what start-ups like ThoughtSpot are doing and think, ‘We can do that’? After all, Microsoft could simply throw money at building something similar into Bing, and into cloud tools for the enterprise?
He says:
When you’re building a start-up, that’s always possible – that everything you’re building a large company will do. But it doesn’t usually happen that way.
That’s because a lot of it is about having a very focused, opinionated view of how the product should be, and how the market should be. And you acquire customers that believe in that. Then you learn from it, and you trade, and over time you build something that’s hard to replicate.
Look at everything we’ve done in terms of being able to index every domain value, and understand and parse when that value is being referred to. Like if someone asks, ‘How much revenue am I getting from red shoes?’ we immediately know that ‘red shoes’ is a subcategory in the product table, and revenue is a column in the fact table, and these things need to be brought together in the way our data model expresses.
Having built the system that way, it gives us the ability to take this question and simplify the world for GPT, so that it doesn’t see all the real-world complexity.
Then he adds:
That’s a very long-winded way of saying, as a start-up, I’ll always be worried about what everybody else is doing. But not too worried, because it will take a lot for someone to build what we have built.
My take
A promising technology from an interesting start-up – plus a refreshingly candid CTO with his eye on enterprise value. One thing is certain: Microsoft will be watching, one way or another.