Getting real on AI in application security

As the AI feeding frenzy continues, there is a lot of speculation and scaremongering out there, especially in terms of security. This post sets the record straight on some of the crucial ways that AI relates to application security, based on the points raised by Invicti’s Frank Catucci on Application Security Weekly #234.

Getting real on AI in application security

AI is definitely the hot topic right now, and a lot of people are throwing around or downright parroting information and opinions. Invicti’s CTO and Head of Security Research, Frank Catucci, spoke to Mike Shema on episode #234 of the Application Security Weekly cybersecurity podcast to discuss what, realistically, AI means for application security today and in the nearest future. Watch the full video below and read on to get an overview of AI as it currently relates to application security – and to learn about the brand-new art of hallucination squatting.

Faster, easier to use, and rife with risk

For all the hype around large language models (LLMs) and generative AI in recent months, the underlying technologies have been around for years, with the tipping point brought about by relatively minor tweaks that have made AI more accessible and useful. While nothing has fundamentally changed on the technical side, the big realization is that AI is here to stay and set to develop even faster, so we really need to understand it and think through all the implications and use cases. In fact, industry leaders recently signed an open letter calling for a 6-month pause in developing models more powerful than GPT-4 until the risks are better understood.

As AI continues to evolve and get used far more often and in more fields, considerations like responsible usage, privacy, and security become extremely important if we’re to understand the risks and plan for them ahead of time rather than scrambling to deal with incidents after the fact. Hardly a day goes by without another controversy related to ChatGPT data privacy, whether it’s the bot leaking user information or being fed proprietary data in queries with no clear indication of how that information is processed and who might see it. These concerns are compounded by the growing awareness that the bot is trained on publicly-accessible web data, so despite intense administrative efforts, you can never be sure what could be revealed.

Attacking the bots: Prompt injection and more

With conversational AI such as ChatGPT, prompts entered by users are the main inputs to the application – and in cybersecurity, when we see “input,” we think “attack surface.” Unsurprisingly, prompt injection attacks are the latest hot area in security research. There are at least two main directions to explore: crafting prompts that extract data the bot was not supposed to expose and applying existing injection attacks to AI prompts.

The first area is about bypassing or modifying guardrails and rules defined by the developers and administrators of a conversational AI. In this context, prompt injection is all about crafting queries that will cause the bot to work in ways it was not intended to. Invicti’s own Sven Morgenroth has created a dedicated prompt injection playground for testing and developing such prompt injection attacks in controlled circumstances in an isolated environment.

The second type of prompt injection involves treating prompts like any other user input to inject attack payloads. If an application doesn’t sanitize AI prompts before processing, it could be vulnerable to cross-site scripting (XSS) and other well-known attacks. Considering that ChatGPT is also commonly asked about (and for) application code, input sanitization is particularly difficult. If successful, such attacks could be far more dangerous than prompts to extract sensitive data, as they could compromise the system the bot runs on.

The many caveats of AI-generated application code

AI-generated code is a whole separate can of worms, with tools such as GitHub Copilot now capable not only of autocompletion but of writing entire code blocks that save developers time and effort. Among the many caveats is security, with Invicti’s own research on insecure Copilot suggestions showing that the generated code often cannot be implemented as-is without exposing critical vulnerabilities. This makes routine security testing with tools like DAST and SAST even more important, as it’s extremely likely that such code will make its way into projects sooner or later.

Again, this is not a completely new risk, since pasting and adapting code snippets from Stack Overflow and similar sites has been a common part of development for years. The difference is the speed, ease of use, and sheer scale of AI suggestions. With a snippet found somewhere online, you would need to understand it and modify it to your specific situation, typically working with only a few lines of code. But with an AI-generated suggestion, you could be getting hundreds of lines of code that (superficially at least) seems to work, making it much harder to get familiar with what you’re getting – and often removing the need to do so. The efficiency gains can be huge, so the pressure to use that code is there and will only grow, at the cost of knowing less and less of what goes on under the hood.

Vulnerabilities are only one risk associated with machine-generated code, and possibly not even the most impactful. With the renewed focus in 2022 on securing and controlling software supply chains, the realization that some of your first-party code might actually come from an AI trained on someone else’s code will be a cold shower for many. What about license compliance if your commercial project is found to include AI-generated code that is identical to an open-source library? Will that need attribution? Or open-sourcing your own library? Do you even have copyright if your code was machine-generated? Will we need separate software bills of materials (SBOMs) detailing AI-generated code? Existing tools and processes for software composition analysis (SCA) and checking license compliance might not be ready to deal with all that.

Hallucination squatting is a thing (or will be)

Everyone keeps experimenting with ChatGPT, but at Invicti, we’re always keeping our eyes open for unusual and exploitable behaviors. In the discussion, Frank Catucci recounts a fascinating story that illustrates this. One of our team was looking for an existing Python library to do some very specific JSON operations and decided to ask ChatGPT rather than a search engine. The bot very helpfully suggested three libraries that seemed perfect for the job – until it turned out that none of them really existed, and all were invented (or hallucinated, as Mike Shema put it) by the AI.

That got the researchers thinking: If the bot is recommending non-existent libraries to us, then other people are likely to get the same recommendations and go looking. To check this, they took one of the fabricated library names, created an actual open-source project under that name (without putting any code in it), and monitored the repository. Sure enough, within days, the project was getting some visits, hinting at the future risk of AI suggestions leading users to malicious code. By analogy to typosquatting (where malicious sites are set up under domains corresponding to the mistyped domain names of high-traffic sites), this could be called hallucination squatting: deliberately creating open-source projects to imitate non-existent packages suggested by an AI.

And if you think that’s just a curiosity with an amusing name (which it is), imagine Copilot or a similar code generator actually importing such hallucinated libraries in its code suggestions. If the library doesn’t exist, the code won’t work – but if a malicious actor is squatting on that name, you could be importing malicious code into your business application without even knowing it.

Using AI/ML in application security products

Many companies have been jumping on the AI bandwagon in recent months, but at Invicti, we’ve been working on more traditional and predictable machine learning (ML) techniques to improve our products and processes internally. As Frank Catucci said, we routinely analyze anonymized data from the millions of scans on our cloud platform to learn how customers use our products and where we can improve performance and accuracy. One way that we plan to use AI/ML to improve user outcomes is to help prioritize vulnerability reports, especially in large environments.

In enterprise settings, some of our customers routinely scan thousands of endpoints, meaning websites, applications, services, and APIs, all adding up to massive numbers. We are developing a machine learning algorithm to suggest to users which of these assets should be prioritized based on the risk profile, considering multiple aspects like identified technologies and components but also the page structure and content. This type of assistant can be a massive time-saver when looking at many thousands of issues that you need to triage and address across all your web environments. When improving this model internally, we’ve had cases where we started with somewhere like 6000 issues and managed to pick out the most important 200 or so at a level of confidence in the region of 85%. Once this is put in production, it will make the prioritization process much more manageable for users.

Accurate AI starts with input from human experts

When trying to accurately assess real-life risk, you really need to start with training data from human experts because AI is only as good as its training set. Some Invicti security researchers, like Bogdan Calin, are active bounty hunters, so in improving this risk assessment functionality, they correlate the weights of specific vulnerabilities with what they are seeing in bounty programs. This also helps to narrow down the real-life impact of a vulnerability in context. As Frank Catucci stated, a lot of that work is actually about filtering out valid warnings about outdated or known-vulnerable components that are not a high risk in context. For example, if a specific page doesn’t accept much user input, having an outdated version of, say, jQuery will not be a priority issue there, so that result can move further down the list.

But will there come a time when AI can take over some or all of the security testing from penetration testers and security engineers? While we’re still far from fully autonomous AI-powered penetration testing (and even bounty submissions), there’s no question that the new search and code generation capabilities are being used by testers, researchers, and attackers. Getting answers to things like “code me a bypass for such and such web application firewall” or “find me an exploit for product and version XYZ” can be a huge time-saver compared to trial and error or even a traditional web search, but it’s still fundamentally a manual process.

Known risks and capabilities – amplified

The current hype cycle might suggest that Skynet is just around the corner, but in reality, what seems an AI explosion merely amplifies existing security risks and puts a different twist on them. The key to getting the best out of the available AI technologies (and avoiding the worst) is to truly understand what they can and cannot do – or be tricked into doing. And ultimately, they are only computer programs written by humans and trained by humans on vast sets of data generated by humans. It’s up to us to decide who is in control.

Zbigniew Banach

About the Author

Zbigniew Banach - Technical Content Lead & Managing Editor

Cybersecurity writer and blog managing editor at Invicti Security. Drawing on years of experience with security, software development, content creation, journalism, and technical translation, he does his best to bring web application security and cybersecurity in general to a wider audience.