Looking for the vulnerability index of Invicti's legacy products?
Apache Tika XXE via PDF XFA Content (CVE-2025-66516) - Vulnerability Database

Apache Tika XXE via PDF XFA Content (CVE-2025-66516)

Description

Apache Tika is a widely used open-source content analysis toolkit for extracting metadata and text from various file types.

A critical XML External Entity (XXE) injection vulnerability exists in Apache Tika's core XML parsing functionality. The vulnerability is triggered when Tika processes PDF files containing maliciously crafted XFA (XML Forms Architecture) content. Attackers can exploit this flaw by embedding XXE payloads within XFA structures inside PDF documents, causing the vulnerable XML parser to resolve external entities during document processing. No authentication or user interaction is required to exploit this vulnerability.

Remediation

Immediately upgrade Apache Tika components to patched versions to remediate this critical vulnerability:<br/><br/><strong>1. Upgrade Affected Modules:</strong><br/> <strong>tika-core:</strong> Upgrade to version 3.2.2 or later<br/> <strong>tika-parser-pdf-module:</strong> Upgrade to version 3.2.2 or later<br/> <strong>tika-parsers (1.x releases):</strong> Upgrade to version 2.0.0 or later<br/><br/><strong>Important:</strong> The vulnerability resides in tika-core. Even if you previously patched based on CVE-2025-54988 by updating only the PDF module, you remain vulnerable unless tika-core is also upgraded to version 3.2.2 or later.<br/><br/><strong>2. Maven Dependency Update Example:</strong><br/><pre>&lt;dependency&gt; &lt;groupId&gt;org.apache.tika&lt;/groupId&gt; &lt;artifactId&gt;tika-core&lt;/artifactId&gt; &lt;version&gt;3.2.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.tika&lt;/groupId&gt; &lt;artifactId&gt;tika-parser-pdf-module&lt;/artifactId&gt; &lt;version&gt;3.2.2&lt;/version&gt; &lt;/dependency&gt;</pre><br/><strong>3. Temporary Mitigations (if immediate patching is not possible):</strong><br/> Disable PDF parsing entirely by configuring a custom tika-config.xml that excludes the PDF parser<br/> Pre-process incoming PDFs using tools like qpdf or pdfid.py to detect and reject files containing XFA structures or /AcroForm markers<br/> Implement strict network egress controls to limit outbound connections from systems running Tika<br/> Deploy Web Application Firewall (WAF) rules to detect XXE patterns in uploaded files<br/><br/><strong>4. Post-Remediation Steps:</strong><br/> Audit logs for suspicious file upload activity or unexpected outbound network connections<br/> Review all applications in your environment that depend on Apache Tika, including Elasticsearch, Apache Solr, Atlassian products, and Alfresco<br/> Verify that transitive dependencies have also been updated to patched versions

Related Vulnerabilities