Application Security Platform
Crawling options

Exclude paths from scanning

This document is for:
Invicti Platform

There are situations where you may need to configure Invicti Platform to exclude a portion of a web application from crawling and scanning. This might be required if the web application being scanned is too large, or if scanning part of the site might trigger unwanted actions such as submitting data. For more information on crawling options, refer to the crawling options overview document.

This document explains how you can specify paths for exclusion based on regular expressions. Excluded paths are added to individual targets on the Scan configure target page. 

If your target URL protocol is redirected (typically from HTTP to HTTPS), any excluded path directives will not apply. If your target employs protocol redirection, make sure that the target is specified with the final protocol to ensure that any excluded paths you specify are indeed excluded.

Add an excluded path

The Excluded paths option allows you to specify a list of directories and files to be excluded from crawling and scanning. Multiple paths can be excluded for each target.

  1. Select Inventory > Targets from the left-side menu.
  2. Click the > Edit target by the selected target to access its settings page.
  3. Click Crawling Options from the settings menu and scroll Excluded paths section.
  4. In the Excluded Paths field, enter a RegEx for the path you want to exclude from scanning. Refer to the information below these instructions to learn about formatting requirements for excluded paths.
  5. Click Save target configuration when you are finished.

Excluded paths formatting requirements

Excluded paths need to be configured using regular expressions (RegEx). This is useful in situations where you want to exclude a URL pattern rather than a single URL. Invicti Platform accepts the widely used Perl Compatible Regular Expressions (PCRE) syntax for defining RegEx.

The format for creating exclusions is with a forward slash at the front (/) followed by the path that should be after the target URL. Once a path is excluded from scanning, all its subdirectories will also be excluded from the scan because once a directory is not crawled, the scanner cannot know that there is anything below that directory that has been ignored.

Example

  • Target URL = www.example.com
  • Directory to exclude = /dir2 which is in directory /dir1 (www.example.com/dir1/dir2)
  • Excluded path = /dir1/dir2 where /dir2 will be ignored by the scanner. Note that /dir1 and everything in it (except /dir2) will still be scanned.
  • RegEx = /dir1/dir2(/.*)?$

Before adding an excluded path, you may wish to test your RegEx in a tool such as Regex101.

The table below provides examples of regular expressions you can configure in Invicti to restrict URL patterns.

Description

Regular expression

Matches

(excludes path)

Does not match (does not exclude path)

* Wildcard

/dir.*/otherdir

  • /dir/otherdir
  • /dir1/otherdir
  • /dira/otherdir
  • /dir123/dir4/otherdir
  • /dir
  • /dir/dir1
  • /dir/dira
  • /dir/dir123

? Wildcard

/dir.?/otherdir

  • /dir/otherdir
  • /dir1/otherdir
  • /dira/otherdir
  • /dir
  • /dir/dir1
  • /dir/dira
  • /dir/dir123
  • /dir123/otherdir

Digit Wildcard

/dir[\d]+/otherdir

  • /dir1/otherdir
  • /dir01/otherdir
  • /dir9999/otherdir
  • /dir/otherdir
  • /dira/otherdir
  • /dir1a/otherdir

Exclude URLs more than 1-level deep

(/.+){2,}

  • /dir/dir1
  • /dir/dir1/dira
  • /dir/file.html
  • /dir/file.html?q=value
  • /dir
  • /file.html
  • /file.html?q=value

Exclude URLs more than 2-levels deep

(/.+){3,}

  • /dir/dir1/dira
  • /dir/dir1/file.html?q=value
  • /dir
  • /dir/dir1
  • /dir/file.html
  • /dir/file.html?q=value

Exclude specific directories

/dir(/.*)?$

  • /dir
  • /dir1/dir
  • /dir1
  • /dira/dirb

Exclude all URLs (useful when supplying Invicti with a list of URLs to scan)

^/.*$

  • /dir
  • /dir/file.html
  • /dir/file.html?q=value

Share This Article