Directory traversal

What is directory traversal?

Directory traversal, also known as path traversal, is a cybersecurity term for a specific class of software security vulnerabilities. If a malicious hacker is able to access and view files located in the web server file system but outside of the web application’s document root folder, it means that the software has a directory traversal vulnerability.

Severity:⬛⬛⬜⬜⬜medium severity
Prevalence:⬛⬜⬜⬜⬜discovered rarely
Scope:⬛⬛⬜⬜⬜appears only in web-related software
Technical impact:access to sensitive information
Worst-case consequences:follow-up with other attacks
Quick fix:do not use filenames from user input

How does directory traversal work?

Source code files that make up a website or web application are located on a web server file system in a location that is called the web document root (web root folder). The primary document root usually contains subdirectories for each website and web application. For example, on a Linux/UNIX server with the Apache web server software, the default root folder is /var/www/ and on a Microsoft Windows server with IIS, the default document root is C:\inetpub\wwwroot.

Developers sometimes need to write application code that directly accesses files stored somewhere in the document root directory or a subdirectory. For example, a developer may want to store images uploaded by users and then allow other users to display them. A user input parameter would then contain the image filename from /var/www/my_app/images/, and the application would open a particular image and display it on-screen.

Directory traversal vulnerabilities happen when a malicious user can include an arbitrary file path in user input and use special characters to access files from a different directory on the server. Special characters used for this are dot-dot-slash combinations: ../ for Linux/UNIX or ..\ for Windows. These combinations allow access to parent directories from a relative path.

While directory traversal is a typical web application vulnerability, it is most often found in embedded web software, for example, device management software or remote administration interfaces. Some path traversal vulnerabilities are even attributed to web servers themselves.

Directory traversal vs. local file inclusion (LFI)

Path traversal vulnerabilities are often confused with local file inclusion (LFI), which is a similar but distinct vulnerability:

  • LFI means that the attacker can include source code files or view files that are located within the document root directory and its subdirectories. It does not mean that the attacker can reach outside the document root.
  • Directory traversal means that the attacker can access files located outside the document root directory, but the attack does not involve running any malicious code.

To add to the confusion, the two very often appear together and also have exactly the same cause: the developer allowing paths to local files to be passed as part of user input.

Example of a directory traversal attack

Below is a simple example of PHP source code with a directory traversal vulnerability and a path traversal attack vector on an application that includes this code.

Vulnerable code

The developer of a PHP application wants the user to be able to read poems stored in text files on the web server. These poems in text files are uploaded by other users and stored in a relative poems directory – the absolute path to the images directory is /var/www/my_app/poems/. The following is a code snippet from the poems/display.php file, which displays the poem as part of the HTML.

<?PHP 
  $file = $_GET["file"];
    $handle = fopen($file, 'r');
    $poem = fread($handle, 1);
    fclose($handle);
    echo $poem;
?>

As you can see, the filename is taken directly from the GET HTTP request received from the user. Therefore, you can access and display a poem called poem.txt using the following URL:

http://example.com/my_app/display.php?file=poem.txt

The attack vector

The attacker abuses this script by manipulating the GET request using the following payload:

http://example.com/my_app/display.php?file=../../../../etc/passwd

The display.php script goes four levels up in the directory structure to the Linux root directory, then to the /etc/ directory, and then exposes the passwd file, which contains all the names of operating system users on this server:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
(...)

Potential consequences of a directory traversal attack

The only direct consequence of a directory traversal attack is access to sensitive information. This sensitive information may be used directly or to follow up with other attacks. If there is sensitive information stored in files on the server, for example, confidential photos of documents or sensitive data in text files, the attacker can find and access such files.

In other cases, directory traversal attacks are used to access typical files that exist on many web servers. Attackers can then use the information from these files to find other appsec attack methods, which may ultimately lead to full server compromise. 

Because the web server and its applications access files using the limited permissions of the system account used for the web server process, certain sensitive files such as /etc/shadow (Linux/UNIX password file with hashes), as well as restricted directories, are usually not accessible as a result of a directory traversal attack.

Here are some files that are often the target of directory traversal attacks on Linux-based web servers. All these files are always readable by all operating system users:

  • /proc/version – contains the version of the Linux kernel running on the system. This information allows the attacker to find exploits for that particular Linux kernel.
  • /proc/mounts – contains a list of currently mounted file systems. This allows the attacker to try to access these file systems, for example, through follow-up directory traversal attacks.
  • /proc/net/arp – contains the address resolution protocol (ARP) table, which could be used to discover other connected systems (potential attack targets).
  • /proc/net/tcp and /proc/net/udp – contain lists of ongoing TCP/UDP connections, which could be used to discover other connected systems (again, potential attack targets).

Note that directory traversal is very easy to automate via a technique called fuzzing, which involves automatically sending typical attack payloads to the target. Attackers can use dedicated fuzzing apps such as DotDotPwn, so very little technical knowledge is required for such an attack.

Examples of known directory traversal vulnerabilities

  • CVE-2021-41773 in Apache server HTTP Server 2.4.49 could allow a malicious hacker to escalate the attack to remote code execution. This is a very recent example from 2021, which also proves that despite the fact that directory traversal is a medium-severity vulnerability, it can lead to the most severe attacks.
  • CVE-2018-13379 is a directory traversal vulnerability discovered in 2018 in Fortinet FortiOS – the operating system of FortiGate firewalls. This vulnerability was even listed by CISA in 2021 as being one of the top routinely exploited vulnerabilities, which proves that even a 3-year old vulnerability is used for many successful attacks today.
  • In 2015, security researcher Kyle Lovett discovered more than 700,000 routers with directory traversal vulnerabilities in their administrative web interfaces. These routers, popular among consumers and SMBs, came from different manufacturers but had one thing in common – they all used firmware from Shenzhen Gongjin Electronics. This directory traversal flaw allowed the attacker to use the webproc.cgi script to access the config.xml files, which contained easy-to-crack administrator password hashes, ISP connection usernames/passwords, Wi-Fi passwords, as well as the client and server credentials for the TR-069 remote management protocol used by some ISPs.
  • 2004 marks one of the oldest known cases related to directory traversal. One year later, Daniel Cuthbert was convicted for using directory traversal to hack the donate.bt.com website.

How to detect directory traversal vulnerabilities?

The best way to detect directory traversal vulnerabilities depends on whether they are already known or unknown.

  • If you only use commercial or open-source software and do not develop software of your own, you may find it enough to identify the exact version of the system or application that you are using. If the identified version is vulnerable to directory traversal, you can assume that you are susceptible to that directory traversal vulnerability. You can identify the version manually or use a suitable security tool, such as software composition analysis (SCA) software in the case of web applications or a network scanner in the case of networked systems and applications.
  • If you develop your own software or want to potentially find unknown directory traversal vulnerabilities (zero-days) in known applications, you must be able to successfully exploit the directory traversal vulnerability to be certain that it exists. In such cases, you need to either perform manual penetration testing with the help of security researchers or penetration testers or use an application security testing tool (web vulnerability scanner) that can automatically exploit vulnerabilities. Examples of such tools are Invicti and Acunetix by Invicti. We recommend using this method even for known vulnerabilities.

How to prevent directory traversal vulnerabilities in web applications?

There are several methods that allow you to prevent directory traversal vulnerabilities in your code:

  1. Avoid passing any filenames in user input. This includes not just direct user input but also other data sources that can be manipulated by the attacker, for example, cookies.
  2. If your application requires you to use filenames from user input and there is no way around it, create a whitelist of safe files.
  3. If you cannot create a whitelist because you use arbitrary filenames, for example, if users upload the files, store filenames in the database and use table row identifiers in user input. You can also use URL mappings to identify files with no risk of path traversal.

The above methods are available in every programming language and therefore every developer can easily prevent directory traversal vulnerabilities by using secure coding techniques. There is no excuse for leaving your application vulnerable to directory traversal.

Note: Do not use blacklisting, encoding, or methods of input validation such as filtering to prevent directory traversal. For example, don’t try to limit or enforce file extensions or block special character sequences. Attackers can use a variety of tricks, such as URL encoding, to bypass such filters.

How to mitigate directory traversal attacks?

Methods to mitigate directory traversal attacks will differ depending on the type of software:

  • In the case of custom web applications, you can mitigate directory traversal attacks by running your web application in a limited environment, which is very common for web APIs. For example, running your application in a separate Docker container will limit the number of files that the attacker can access and will limit the potential effect of accessing system information. 
  • If you can’t run custom web applications in a separate container, you can set up your web server access control to completely deny access to any parent directories. You can make it appear to web applications as if the document root is the root of the filesystem, which will make it impossible for the attacker to move up in the dir tree.
  • In the case of known directory traversals in third-party software, such as for example administration software for hardware routers and firewalls, you must check the latest security advisories for a fix and update to a non-vulnerable version.

In the case of zero-day directory traversals in third-party software, you can apply temporary WAF (web application firewall) rules for mitigation. However, this only makes the directory traversal harder to exploit and does not eliminate the problem.

ClassificationID
CAPEC126
CWE23
WASC33
OWASP 2021A1

Related blog posts