Why Predictive Risk Scoring is the smart way to do AI in application security

Everyone is adding LLMs to their products, but Predictive Risk Scoring from Invicti takes a more thoughtful and effective approach to using AI in application security. We sat down with Invicti’s Principal Security Researcher, Bogdan Calin, for an in-depth interview about the internals of this unique feature and the importance of choosing the right AI models for security tools.

Why Predictive Risk Scoring is the smart way to do AI in application security

Invicti recently launched its Predictive Risk Scoring feature, which as a genuine industry first can generate accurate security risk predictions before vulnerability scanning even begins. To recap briefly, Predictive Risk Scoring uses a custom-built machine learning model that is trained on real-world vulnerability data (but not customer data), operated internally by Invicti, and can closely estimate the likely risk level of a site to aid prioritization. 

Following up on our initial post introducing this new capability and its potential to bring a truly risk-driven approach to application security, here’s a deeper dive into the technical side of it. We sat down with Bogdan Calin, Invicti’s Principal Security Researcher and the main creator of Predictive Risk Scoring, for a full interview not only about the feature itself but also about AI, ML, and the future of application security.

Companies in every industry, including security, are rushing to add AI features based on large language models (LLMs). What makes Invicti’s approach to AI with Predictive Risk Scoring different from everyone else?

Bogdan Calin: The most important thing about implementing any AI feature is to start with a real customer problem and then find a model and approach that solves this problem. You shouldn’t just force AI into a product because you want to say you have AI. For Predictive Risk Scoring, we started with the problem of prioritizing testing when customers have a large number of sites and applications and they need to know where to start scanning. It was clear from the beginning that using an LLM would not work for what we needed to solve this problem, so we picked a different machine learning model and trained it to do exactly what we needed.

Why exactly did you choose a dedicated machine learning model for Predictive Risk Scoring versus using an LLM? What are the advantages compared to simply integrating with ChatGPT or some other popular model?

Bogdan Calin: In security, you want reliable and predictable results. Especially when you’re doing automated discovery and testing like in our tools, an LLM would be too unpredictable and too slow to solve the actual customer problem. For estimating the risk levels, we needed a model that could process some website attribute data and then make a numeric prediction of the risk. LLMs are designed to process and generate text, not to perform calculations, so that’s another technical reason why they would not be the best solution to this problem. Instead, we decided to build and train a decision tree-based model for our specific needs.

Having a dedicated machine learning model is perfect for this use case because it gives us everything we need to get fast, accurate, and secure results. Compared to an LLM, our model is relatively lightweight, so processing each request is extremely fast and requires minimal computing resources. This lets us check thousands of sites quickly and run the model ourselves without relying on some big LLM provider and also without sending any site-related data outside the company.

The biggest drawback of using LLMs as security tools is they are not explainable or interpretable, meaning that the internal layers and parameters are too numerous and too complex for anyone to say, “I know exactly how this result was generated.” With decision tree models like the one we use for Predictive Risk Scoring, you can explain the internal decision-making process. The same input data will always give you exactly the same result, which you can’t guarantee with LLMs. Our model is also more secure because there is no risk of text-based attacks like prompt injections.

And maybe the biggest advantage compared to an LLM is that we could build, train, and fine-tune the model to do exactly what we wanted and to return very accurate results. Just mathematically speaking, those risk predictions are fully accurate for at least 83% of cases, but the useful practical accuracy is much higher, closer to 90%.

Could you go a bit deeper into those accuracy levels? We’ve been giving that number of “at least 83%,” but what does accuracy really mean in this case? How is it different from things like scan accuracy?

Bogdan Calin: The idea of Predictive Risk Scoring is to estimate the risk level of a site before scanning it, based on a very small amount of input data compared to what we would get from doing a full scan. So this prediction accuracy really means confidence that our model can look at a site and predict its exact risk level in at least 83% of cases. And this is already a very good result because it is making that prediction based on very incomplete data.

For practical use in prioritization, the prediction accuracy is much higher. The most important thing for a user is not the exact risk score but knowing which sites are at risk and which are not. From this yes/no point of view for prioritization, our model has over 90% accuracy in showing customers which of their sites they should test first. Technically speaking, this is probably the best estimate you can get without actually scanning each site to get the full input data, no matter if you’re using AI or doing it manually.

One important thing is that predictive risk scores are completely different from vulnerability scan results. With risk scoring, we are looking at a site before scanning and estimating how vulnerable it seems. A high risk score indicates that a site has many features similar to vulnerable sites in our training data, so the model predicts that it carries a high risk. In contrast, when our DAST scanner scans a site and reports vulnerabilities, these are not predictions or estimates but facts—the results of running actual security checks on the site.

Many organizations and industries are subject to various restrictions on the use of AI. How does Predictive Risk Scoring fit into such regulated scenarios?

Bogdan Calin: Most of the regulations and concerns about AI are specifically related to LLMs and generative AI. For example, there are concerns about sending confidential information to an external provider and never knowing for sure if your data will be used to train the model or exposed to users in some other way. Some industries also require all their software (including AI) to be explainable, and, as already mentioned, LLMs are not explainable because they are black boxes with billions of internal parameters that all affect each other.

With Predictive Risk Scoring, we don’t use an LLM and also don’t send any requests to an external AI service provider, so these restrictions don’t apply to us. Our machine learning model is explainable and deterministic. It is also not trained on any customer data. And, again, because it doesn’t process any natural language instructions like an LLM, there is no risk of prompt injections and similar attacks.

AI is undergoing explosive growth in terms of R&D, available implementations, and use cases. How do you think this will affect application security in the near future? And what’s next for Predictive Risk Scoring?

Bogdan Calin: We are lucky because, at the moment, it’s not easy to use publicly available AI language models to directly create harmful content like phishing and exploits. However, as AI models that are freely available for anyone to use (like llama3) become more advanced and it becomes easier to use uncensored models, it’s likely that future cyberattacks will increasingly rely on code and text generated by artificial intelligence.

I expect Android and iOS to have small, local LLMs running on our phones eventually to follow our voice instructions and help with many tasks. When this happens, prompt injections will become very dangerous because AI voice cloning is already possible with open-source tools, so voice-based authentication alone cannot be trusted. Prompt attacks could also come via our emails, documents, chats, voice calls, and other avenues, so this danger will only increase.

AI-assisted application development is already very common and will become the normal way to build applications in the future. As developers get used to having AI write the code, they may increasingly rely on the AI without thoroughly verifying code security and correctness. Because LLMs don’t always generate secure code, I would expect code security to decrease overall.

For Predictive Risk Scoring, I can say that we are already working on refining and improving the feature to get even better results and also to expand it by incorporating additional risk factors.

Ready to go proactive with your application security? Get a free proof-of-concept demo!
Zbigniew Banach

About the Author

Zbigniew Banach - Technical Content Lead & Managing Editor

Cybersecurity writer and blog managing editor at Invicti Security. Drawing on years of experience with security, software development, content creation, journalism, and technical translation, he does his best to bring web application security and cybersecurity in general to a wider audience.