Web asset discovery

What is web asset discovery?

The term web asset discovery applies to a mechanism associated with web application security testing. This function is often available as a module in other cybersecurity tools and is rarely available as a standalone tool. Note that web asset discovery is a relatively new concept and the term itself is not formally defined. Therefore, some tools may use custom terms for the same functionality.

The goal of web asset discovery software is to find/discover web assets, such as websites, web applications, or APIs. These assets can then be used as targets for other tools such as dynamic application security testing (DAST) to identify vulnerabilities and other potential security risks/attack vectors and allow for remediation. Web asset discovery identifies web assets based on a seed keyword provided by the user. The seed keyword is, most commonly, the name of the company. The result is a knowledge base containing a list of domains or subdomains that host websites, web apps, or web APIs The list can later be used for security testing, vulnerability management, and more.

Organizations need discovery tools because they often struggle to automate their web inventory management – they fail to identify and keep a complete list of all web assets that they own, which becomes an even bigger problem as the organization grows. For example, a corporate department may register a domain and put up a campaign website without ever notifying the corporate web security team or even the IT administration team. As a result, the company’s true web attack surface is unknown, which may lead to attacks against untested assets, potentially resulting in costly data breaches and reputation loss.

Types of asset discovery

Web asset discovery should not be confused with IT asset discovery, IT asset management (ITAM), or IT service management (ITSM). While there may be some overlap, they are very different mechanisms.

  • Web asset discovery tools, such as those built into Invicti and Acunetix by Invicti DAST solutions, find web assets on the Internet using publicly available information. These tools attempt to find any web asset in the world that is in some way associated with the seed keyword (company name). A web asset discovery tool does not care where the asset is physically located or what network it is connected to.
  • IT asset discovery tools are completely different despite having a similar name. They find network assets on internal networks only (on-premises). These tools are usually part of other IT security tools, most often network security scanners (as in the case of SolarWinds, for example). They use the network scanner to first map all endpoint IP addresses and MAC addresses on a local network and then attempt to identify network devices available at these endpoints, be it laptops, routers, servers, smartphones or other mobile devices, or IoT devices, as well as software assets on these devices (via port scanning and fingerprinting). This is often done not just for security and malware scanning but also for software license identification purposes (detecting unauthorized software, software dependencies, etc.). Such tools help create a complete asset inventory used by IT infrastructure management/configuration management solutions, metrics/visualization solutions, and troubleshooting/ticketing by IT support/service desk/help desk teams.

How does global web asset discovery work?

Web asset discovery tools keep improving over time as innovative companies come up with new ways to build web asset libraries. Here are some of the techniques used:

  • Public certificate registry search. Since almost every web asset nowadays uses SSL/TLS for authentication/authorization, the asset owner must register a certificate for this asset. When the certificate is registered, it includes the name of the company (the keyword) and the domain/subdomain. All such information is available in repositories (certificate transparency logs). Providers of web asset discovery tools partner with companies that hold such repositories, enabling the tools to search for matching certificate information and therefore easily identify domains and subdomains.
  • Public domain registry search. Similar to SSL/TLS certificates, every domain must be registered with a public registrar. Domain registration information is available publicly, but nowadays registrars often provide a service to anonymize publicly accessible data, which renders this method less effective than SSL/TLS certificate search.
  • Search engine integration. Search engines already use crawlers to identify web pages, and therefore also websites and web applications. A web asset discovery tool may use a search engine to identify domains/subdomains associated with the keyword.
  • Manual link following. The most resource-intensive web asset discovery process is to follow all links from a known asset (for example, the company website) to potentially discover any related domains/subdomains. However, this technique is very ineffective in the long term and is rarely used.

Note that most web asset discovery mechanisms are SaaS crawlers. This means that once you provide them with a keyword to search for, usually via the primary tool dashboard, they keep searching for it indefinitely in real-time and provide you with information on any new assets found, for example, via notifications every 24 hours. This allows web asset discovery to become part of a regular web development and security workflow/lifecycle.