Invicti detected a
Robots.txt file with potentially sensitive content.
Robots.txt, and ensure they are correctly protected by means of authentication.
The following block can be used to tell the crawler to index files under /web/ and ignore the rest:
Robots.txt is only used to instruct search robots which resources should be indexed and which ones are not.
Please note that when you use the instructions above, search engines will not index your website except for the specified directories.
If you want to hide certain section of the website from the search engines
X-Robots-Tag can be set in the response header to tell crawlers whether the file should be indexed or not:
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
X-Robots-Tag you don't have to list the these files in your
It is also not possible to prevent media files from being indexed by putting using Robots Meta Tags.
X-Robots-Tag resolves this issue as well.
For Apache, the following snippet can be put into
httpd.conf or an
.htaccess file to restrict crawlers to index multimedia files without exposing them in
<Files ~ ".pdf$">
# Don't index PDF files.
Header set X-Robots-Tag "noindex, nofollow"
<Files ~ ".(png|jpe?g|gif)$">
#Don't index image files.
Header set X-Robots-Tag "noindex"