Blog
AppSec Blog

The proper place of AI tools in application security

 - 
March 20, 2026

It feels like every week somebody on social media posts an impressive demo of code-level AI tools and says they’re getting so good that separate security tools soon won’t be needed. There’s no doubt that LLM-based tools can be used for some aspects of security testing, but the big question is whether they can replace all the things a real security program depends on: repeatability, predictable cost, coverage at scale, and confidence in the results.

You information will be kept Private
Table of Contents

My view as an AI engineer is that AI in various forms, including inside broader security products, will be extremely valuable in AppSec, but it should be used as part of a workflow rather than as a replacement for the core testing functions teams will still need.

While tool labels may change over time, the underlying requirements do not: you will always need broad coverage, repeatable analysis, and some way to validate what is actually exploitable.

A good demo is not the same as a usable security tool

LLMs can already do a decent job on many security tasks, especially when working with code. For common languages, common frameworks, and familiar vulnerability patterns, they can often reason well about code and explain what looks wrong. That part is real, but it does not make them a practical replacement for security tools.

A security tool has to run repeatedly, fit into CI/CD, handle large codebases, and produce stable and predictable outputs that teams can build processes and automation around. That is a very different requirement from producing a good result in an interactive session or on an artificial benchmark.

This is the main gap in a lot of the current discussion. People see that AI can do some security analysis and jump straight to the idea that it can replace security tooling, but those are two different things.

The practical problems: cost, speed, and consistency

The biggest issue with using an LLM-based tool as a standalone scanner in production is that, currently, it won’t work well enough, cheaply enough, or consistently enough to be the foundation of your team’s work.

Let’s start with cost. Full codebase analysis with large models gets expensive fast, especially when you are scanning repeatedly across large repositories or many services. And security teams don’t scan once – they scan repeatedly.

Then there is speed. Security testing has to keep up with development pipelines. If analysis takes too long or scales badly, teams may simply stop using it.

Next is the matter of consistency. LLMs are probabilistic, not deterministic. This is useful in some cases but can be a real weakness in security workflows. For baseline scanning, policy enforcement, and trend tracking, repeatability matters, and that’s where probabilistic output becomes a real operational limitation.

There is also a failure mode that engineers will quickly recognize: an LLM can be wrong in a very convincing way. With a false positive from a traditional tool, you can usually see that something needs additional checking. An AI-generated false positive often comes with a neat and confident explanation, which makes it easier to trust for all the wrong reasons.

AI models are evolving so quickly that they may get faster and cheaper to use, but that won’t help with their other shortcomings. It also does not solve the separate problem of proving reachability and exploitability in a deployed system.

AI works best as a triage layer, not a scanner

Where I think AI is genuinely useful is in reviewing findings rather than generating the entire finding set from scratch.

Take static application security testing, or SAST, as an example. Its value is that it is deterministic, automatable, and available early in development. Its weakness is that it works on a static approximation, which limits accuracy. 

I recently did some practical experiments to try and recreate SAST-like functionality through AI-assisted coding. I found that static analysis tends to over-approximate within one service by flagging paths that are not actually reachable and under-approximate across service boundaries by missing flows that depend on middleware, runtime state, or interactions between systems.

This is why traditional SAST has accuracy problems that can create both noise and blind spots at the same time. In this case, AI can really help. If SAST casts a wide net, AI can assess which findings look real, which ones are likely noise, and where a human should spend time. It can reason across several hops and add context that static rules often miss.

That said, there is an important nuance here related to what you’re actually testing.

LLMs often do well on popular frameworks and libraries because they have seen a huge amount of that code during training. For mainstream stacks such as Django, Spring, or Hibernate, they can sometimes reason as well as, or better than, tools based on static rules. But that advantage drops off quickly when you move into custom abstractions, internal wrappers, and niche frameworks – and real enterprise codebases can have a lot of those.

Could future AI systems orchestrate more of this workflow and take on more of the operational work around testing? They could and they will, with things like agentic pentesting. But once you ground AI in static analysis, runtime testing, and environment-specific evidence, you are talking about a layered approach that is pretty similar to what existing AppSec platforms already use – but with much better automation around it.

AI or not, code-level checks still miss deployment reality

Most of the arguments around AI in application security are based on code-level tools such as Claude Code Security. One thing that sometimes gets lost is that both LLM-powered code analysis and regular static analysis tools are looking at code, not the actual deployed system.

That means they share some of the same blind spots. Feature flags, environment variables, deployment-specific configuration, middleware behavior, and cross-service interactions all affect whether something is actually reachable and exploitable. Neither SAST nor AI can see that reliably from source code alone.

No matter which tool you use, as long as it is only analyzing code, it cannot tell you with confidence what is exposed in the deployed system.

Runtime validation is increasingly important

The gap between what’s visible in source analysis and what’s exploitable in production matters even more now, as AI-assisted development increases code volume and teams ship faster. More code and more services mean more attack surface, more assumptions, and more chances for security teams to drown in theoretical findings.

AI coding by itself does not automatically make applications less secure, but it does change the risk profile. It becomes easier to produce more code, easier to trust polished-looking output without review, and easier to pull in dependencies that nobody has approved or even checked.

In that kind of environment, runtime validation becomes even more important. You need a way to test the running application and establish what is actually reachable. Whether that capability is packaged as a separate DAST tool or included in a broader platform, the requirement does not go away: you still need to test the running application and establish what is actually reachable.

AI is augmenting existing layers, not replacing them

The direction that makes technical sense is not replacement but layering. Use SAST early in the SDLC for deterministic baseline coverage. Use AI-assisted code analysis where extra reasoning helps with triage, prioritization, and finding more complex code paths. Use runtime testing on the running application to validate exploitability. Then correlate the results so teams can focus on findings that are not just theoretically possible but actually reachable.

That is also where I think AppSec is heading more broadly. Not toward one magical replacement tool but toward clearer roles, even if some of those roles will live inside one platform: deterministic tools for coverage, AI for assessment, runtime testing for proof, and correlation for prioritization. And additional AI capabilities to enhance every step rather than displace tools entirely.

Final thoughts

Yes, AI is going to change AppSec workflows – that part of the hype is real. What I do not buy is the idea that teams can replace dedicated security testing with AI alone and get the same level of operational confidence.

LLMs and other forms of AI are already a valuable part of application security programs and toolchains, and I expect AI to transform and improve much of that stack over time. But even as platforms become more AI-driven, the need does not go away for repeatable analysis, runtime proof, and a clear way to connect findings to real risk.

Frequently asked questions

No items found.
Table of Contents