The Open Worldwide Application Security Project (OWASP) has started a new list detailing the most significant Large Language Models (LLM) vulnerabilities underpinning generative AI apps. The group cut its teeth with a list of ten web vulnerabilities in 2003 on securing websites and then later added APIs in 2019. The new OWASP Top 10 for Large Language Model Applications chronicles new risks that emerge when deploying LLMs.
The list identifies the most common critical vulnerabilities in LLM. It also provides insight into their potential impact, ease of exploitation, and prevalence. It considers some of the top new threats to be prompt injection, data leakage, inadequate sandboxing, unauthorized code execution, and hallucinations. The group hopes to raise awareness of emerging threats and suggests remediations and new best practices for security teams, data scientists and data engineers.
The group is moving fast. Steve Wilson, Chief Product Officer at Contrast Security, helped launch the new effort last May and enrolled nearly five hundred AI security experts to get the ball rolling. At the launch of the new list, Wilson wrote on LinkedIn:
The creation of this resource involved exhaustive brainstorming, careful voting, and thoughtful refinement. It represents the practical application of our team’s diverse expertise. While reaching this milestone is an achievement, our work doesn’t stop here. We recognize that as the field of LLMs continues to evolve, this resource will need to keep pace. We remain committed to learning, improving, and updating our guide to ensure it stays relevant and useful.
Prompts as the new API
One of the more interesting aspects is the role that prompts into LLMs can play in launching new kinds of attacks. These attacks involve embedding code into a prompt that deceives LLMs and the plugins or applications they are tied to into running unauthorized code. A careful perusal of the list reveals that prompt injection isn’t just a vulnerability on its own. It is also a launch point for many other vulnerabilities on the list.
For example, a few lines of specially crafted code of white-on-white text in one-point font could be easily missed by a human. But they could cause the LLMs that process them to go haywire, kicking off a series of events in plug-ins connected to the browser or the LLMs. They may also give unscrupulous job candidates an unfair advantage when their resume is summarized or analyzed. Perhaps unsurprisingly, Contrast Security announced plans to extend its application security testing platform to support detecting and mitigating prompt injection vulnerabilities.
Many of the other vulnerabilities will not be as simple to protect against. Mitigation will require a careful analysis of the data and code supply chain. Greater vigilance will also be needed when automating things connected to or driven by LLMs. You might want to proceed cautiously in automating your email, trading portfolio, social media feed, or even refund processes with AutoGPT.
The Cliff Notes version
Many of the new threats are non-obvious on the first read. So here is a summary of the term and how it might cause havoc. You can find the full details here.
- Prompt Injection: Crafty inputs could overwrite system prompts or manipulate inputs from other sources. They could steal sensitive information from the user, automate unauthorized purchases through a plug-in, or automate a spam campaign on your behalf.
- Insecure Output Handling: The output of an LLM contains specially crafted code that compromises back-end systems or your browser.
- Training Data Poisoning: Blackhats deliberately embed bad data on websites, Wikipedia, enterprise servers, and even books to bias the models trained on them. Unscrupulous brands might plant data that confuses LLMs into saying bad things about competitive products.
- Model Denial of Service: Attackers directly send a chatbot a series of queries known to have a high cost on back-end systems. They could also plant text on websites known to be accessed by LLMs for answering queries. A botnet of compromised browsers could scale these attacks to flood target chatbots.
- Supply Chain Vulnerabilities: Blackhats plant malicious code, poisoned data, compromised models, and insecure plug-ins into the AI supply chain. These could trick developers into downloading compromised packages, steal data, create backdoors, or send users to bogus scam sites.
- Sensitive Information Disclosure: Hackers trick a model into revealing sensitive information on which it was trained, or that was shared by other users.
- Insecure Plugin Design: This refers to the plugins connected to the LLM for accessing the web, pulling up the weather, retrieving stock quotes, or storing enterprise data. Hackers could redirect the LLM to sites with malicious code, edit data on back-end systems, stage an SQL attack, or transfer code repository ownership to plant new vulnerabilities in a software supply chain.
- Excessive Agency: Hackers induce an LLM to send spam or post on social media on a user’s behalf. They may also trick LLMs designed to automate low-value refunds into processing thousands of bogus ones.
- Over-reliance: Users trust the hallucinations from LLMs without double-checking the bad citations, inaccurate information, or dangerous advice.
- Model Theft: Hackers or disgruntled employees steal models through various back doors into enterprise systems. They could also launch a series of API queries to develop a shadow model.
Takeaway
One of the biggest takeaways from the new list is that we need to think more deeply about how AI systems automate various processes. Increased human oversight is a must. Daryan Dehghanpisheh, President and Co-Founder of Protect AI, which is developing new AI security tools, says:
The most significant thing that OWASP is trying to achieve with this is to ensure that there’s more human in the loop components in the development of LLM applications.
He finds it useful to break down the new OWASP LLM list into two categories. The first concerns how to prevent social engineering from being scaled in the realm of bots and automated AI agents. The other is around securing the software and data supply chain assets that come together to create LLM applications.
He compares the human oversight required to minding a toddler around risks like electrical plugs, knives, and matches. In the case of LLMs, the new dangers include prompt injections, output handling, and training data. Dehghanpisheh explains:
We don’t let kids just play with knives without teaching them how to use them. We don’t just give them a sharp object and say, ‘Go nuts and run with it.’ We’re inspecting what tools are there and what is around them and thinking through things like denial of service, the plugins, excessive agency, and overreliance, and that’s like letting a toddler run wild.
He believes that one area of confusion that can arise from the current formulation is that supply chain vulnerabilities are listed as a separate category. But in practice, plugins, training data, and sensitive data disclosures and processes are all part of the supply chain. He believes the supply chain component could have been more prescriptive and thought out. He suggests a simpler formulation:
I would say the OWASP Top 10 can be simplified into three basic categories. First, you need more humans in the loop. Second, you need new tools, particularly those focused on visibility, auditability and security of all artificial intelligence applications. And third, you need new processes to ensure that LLM applications themselves are hardened, safe and secure.
My take
The original OWASP Top 10 ushered in a new era of web application firewalls. The later API top ten brought attention to new vulnerabilities that were outside the scope of these tools. This drove the creation of a new market for API security platforms based on different principles from companies like Noname Traceable and Salt Security. It also encouraged existing application performance management and observability to extend their offerings into the new domain.
In the long run, a new LLM list will likely have a similar effect. It’s unclear how some of these could be neatly baked into a new AI application firewall type of appliance. However, the list also comes with best practices for mitigating the new issues, which is a good start.