Shortly after my article on AI hallucinations came out, it emerged that one American law firm was caught out for filing a brief citing numerous non-existent court rulings. Eventually, the judge threw out the case and fined the lawyers $5,000 after a stern warning. Numerous newspapers called out the story as highlighting the dangers of relying too blindly on ChatGPT.
While that is certainly a valid consideration, the real issue goes much deeper than that, and it’s not just about generative AI models like Bard, ChatGPT, and Bing Chat. They may be fun for asking about widely agreed-upon cultural lore, but they do not scale to the vast trove of sometimes intentionally messy data from different sources.
Just last week, UK Scientists questioned the findings of a systemic review on the dangers of ultra-processed foods owing to concerns about the use of different definitions for what constitutes ultra-processed food. Some argued that the food industry’s sticky fingers were all over this reinterpretation of what the data meant.
Scaling fact-checking
The deeper issue is that we need to catch up in tracking the provenance of our data sources and processes for analysis as a primary concern. Statements from questionable sources and those derived from them must be called out in big red fonts at every step of the data processing lifecycle.
Recent news might suggest that Donald Trump or Boris Johnson’s assertions don’t always align with others’ experiences of the same events. This has spawned a recent journalist practice of adding line-by-line fact-checking to each utterance in their reporting.
We need to apply the same rigorous process to tracking the provenance of data gathered in different ways, processed by various algorithms, and then stored across disparate databases. This is no easy task without a thoughtful foundation for managing provenance.
A small team of journalists can annotate and fact-check a single speech in an afternoon. This does not automate or scale well across terabytes of enterprise data or petabytes used to train new AI models. Data analysts have developed several complementary approaches for automating parts of the problem.
Argument mining
In 2011 IBM’s Watson demonstrated the ability to answer factual questions more accurately than human champions Brad Rutter and Ken Jennings. However, this approach on its own did not scale efficiently to other problems in finance and healthcare. In 2019, IBM’s Project Debater built on the same argument mining technology lost out to a human. IBM finally sold the tech in 2022 after struggling with profitability for years.
Over the last decade, a small community of argument-mining researchers has explored various approaches for combining argument-mining, a more symbolic approach to AI with LLMs and other more statistical approaches.
A traditional LLM might start by correlating the relatedness of individual words. New argument mining schemes instead start by correlating the relatedness of arguments. This makes it possible to copy the line of reasoning that demonstrated success in past court cases to a new case. This is comparable to how generative AI can apply the style of artists, the logic of coders, and the plots of writers to generate new content.
The data foundation
Previous ideas like data schema and taxonomy governance also have a role to play. I spoke with Dave McComb, president of business consultancy Semantic Arts, a few years ago. He had been exploring how a strong foundation for data schema governance could dramatically reduce the integration costs for new apps.
He touches on these foundations in his various books, including Data Centric Revolution and Software Wasteland. He observed that merely shifting from an application-centric to a data-centric approach can dramatically lower the costs of developing new apps. However, this requires the discipline and executive commitment to build a strong foundation for managing data schemas across all use cases.
Fluree CEO Brian Platz, who has worked with McComb, explains:
By focusing on data-centric security and verifiable data provenance/origin, organizations can maintain the reliability and trustworthiness of their AI systems. This approach helps prevent the propagation of hallucinated or poisoned data into algorithms, enabling decision-makers to have confidence in the insights generated by AI. By ensuring the accuracy and integrity of data, businesses can minimize the risks associated with AI hallucinations and build more robust and dependable AI systems for a wide range of applications.
Implementing robust data-centric security measures involves encryption, access controls, and monitoring to safeguard data throughout its lifecycle. Verifiable data provenance establishes a transparent and traceable record of data sources, transformations, and manipulations, enabling the identification and mitigation of potential issues.
Active metadata management tools for data meshes and data fabrics take complementary approaches for automating some of these ideas. Data meshes approach the problem from the point of creation and ingestion. Each data source is curated by knowledgeable subject matter experts with a solid understanding of its quality and suitability for different use cases. It also lays a foundation for sharing this provenance with all potential users.
Data fabrics approach this problem from the other direction. It suggests ways to create and manage a semantic layer for automatically translating existing data for various use cases. This could be useful when looking at how to line up the meaning and utility of existing data.
Pitting algorithms against each other
These various data governance approaches might be great for traditional enterprise data. But they fail pitifully when trying to align the aspirational utterance from sources like Donald Trump with more literal assertions from election authorities. Anthropic AI’s notion of constitutional AI baked into their Claude chatbot suggests one automated approach for navigating these discrepancies.
The core idea is to create a collection of various types of algorithms that are each good at different tasks. For example, one might be good at synthesizing the rough meaning of thousands of pages of text. Another might be better at fact-checking or ensuring safety. In the case of a law firm, they might still use something like ChatGPT to inform a basic outline of arguments to include in a filing. Then another more traditional search engine would double-check each citation for accuracy before sending it to a human for review.
The actual constitutional approach is more elaborate than that, with multiple tiers of algorithmic experts proficient in one domain or another. When one algorithm seems off base, the others explain why another may be wrong to improve AI decision-making’s human-judged performance and transparency. They argue:
These methods make it possible to control AI behavior more precisely and with far few human labels.
My Take
I recently purchased the new MacBook Air and decided to look into the feasibility of migrating off of my trusted app to efficiently manage notes from different sources. The PC-based program, EccoPro, was last updated in 1997. It automates the process of putting each person’s name at the top of relevant bits in different sections of an outline. Over the last 25 years, I have been unable to find an alternative that can replicate this automated chain of provenance.
Sönke Ahrens’s 2017 book, How to Take Smart Notes, inspired an explosion of new personal knowledge management (PKM) tools like Roam, Obsidian, Notion, and Amplenote. All the modern PKM tools provide a way for elaborately capturing details about sources. But I have to manually copy over the reference information for the source every time I find a useful nugget.
One of the inspirations for this movement was the prodigious output of sociologist Niklas Luhmann, who managed to write more than 70 books and 400 scholarly articles. The story goes that he had mastered a Zettelkasten note-taking method, named after the German word for a note card filing cabinet. However, like myself, many hopeful adopters gave up building a Zettelkasten after struggling with the extra work.
Perhaps it’s time to take notes from the accounting industry. Accountants track the provenance and reuse of numbers with tools like double-entry accounting for spotting errors, Generally Accepted Accounting Practices for defining terms, and finance apps for automating processes. This might not be as straightforward as it sounds. Trump’s accountants apparently struggled to rectify well-established practices with some liberal interpretations of the numbers.