> Protecting the digital nomad
Part 2: In this article we look into the implementation of the embedded webserver that runs on the HooToo Travel Mate 6 router (the device). The webserver is at the core of the TM-06 user interface. It is also the best attack surface to start with. It is best due to the complexity of processing web requests and a historical precedent of web software being susceptible to memory corruption vulnerabilities.
Locating the Web Server
Looking at the HTTP traffic we see that there are two webservers at play. First is the lighttpd/1.4.28 and another is the vshttpd. Different server values are returned even though same IP/port combinations are used. Clearly there is some sort of proxying set up. So, let’s go looking for these servers.
lighttpd is a commonly used webserver for embedded and other small scale systems. It even comes with its own set of vulnerabilities. However, vshttpd seems to be an unknown entity – at least, to a westerner. After much googling, I have not found any source code. The only things that were found are an empty sourceforge project and an attempt to have an http interface for vsftp. So, I have to assume that this was an in-house type of project which means that binary reverse engineering is a must.
Using shodan I’ve been able to find a few instances around East Asia. However, there are only a few and they come and go.
Going to that IP, we can see the login page:
Based on the requests generated by the login page, we can see that it is, indeed, using the same webserver as our TM-06 router:
So, potentially, vshttpd is some private web server implementation that has been resold by the same contractor to various device manufacturers. This is exciting because the findings we make will be applicable to a number of devices. I’ve masked the target IPs for their safety, but if you so wish to find them again just have a look at Shodan.
Finding the Binary
Scanning through /etc/init.d directory, we find that fileserv.sh has references /etc/fileserv/lighttpd.conf. That’s cool, it means that /usr/sbin/fileserv is the lighttpd server we are looking for. The binary is referenced by the same shell script. Scanning further, we find an /etc/init.d/web file which has references to the /usr/sbin/ioos file. Based on the name and the process of elimination let’s decide that this is our other webserver which calls itself vshttpd. This will be confirmed by reverse engineering.
We get some good confirmation using strings. The file command gives us some good information on what to expect.
The device runs a version of a MIPS processor. The instruction set is documented here: MIPS32 24K. It is a relatively simple architecture with a small instruction set. I would encourage everyone to learn it because it is prevalent in the embedded devices world, and MIPS is a great starting point to learn about assembly.
By convention there’s a stack at a high address, similar to that of the x86 systems. On the TM-06 the stack actually moves around due to the memory randomization anti-exploitation measure. The stack is managed by the compiler where it will add/subtract from the $sp register on function return/calls. The $sp register stores the current stack pointer. A function preamble would look something like this:
There’s a global pointer $gp for every function – however, in code compiled by the device developers, the global point does not appear to be used for anything of significance. $t9 is a register used for function calls. Commonly, an address would be loaded into $t9 and the program will branch using that register. The $ra register is used for storing the return address. This is why in the preamble we see it being saved on the stack, so the function knows where to go once it is done.
The above function epilog shows how the return address is used to return the control back to the caller function. The arguments are stored in registers $a0 to $a4 and usually they are stored on the stack before the function does any actual logic. Certain registers such as the $s0 are preserved by the callee.
Finally, MIPS is a pipelined architecture where the developer needs to be aware of its semantics. This is specifically important for branching instructions because the instruction after the branch will start execution before the branching actually finishes. Now, the code we will be looking at is not built by an optimizing compiler and so, branching instructions, such as the jr $ra above, will always have a nop following.
I’ve never, until now, reversed MIPS and from personal experience can say that it is a very easy architecture to get a hang of. So, if this is a your first time then just take one step at a time and try to follow through with the examples in this article. You will get used to the architecture and things will become exponentially easier!
Extracting the Source
Compilers are really smart tools, but in the end they are just a transformation function of the source code. So, if the source code has certain patterns then it may be possible to detect and use them to learn about the binary code. In the case of the webserver from the device, the compiler does not perform a lot of optimizations and so we are able to extract quite a lot of information. Next sections will focus on some of the more obvious patterns that I’ve been able to use for my advantage. Going through these examples is useful to learn about the reverse engineering process as well as how to use various IDA interfaces.
While perusing the assembly code, I’ve started noticing a simple pattern for error handling recurring over and over. There would be some sort of a check and if the check failed then an error message would be generated. The message is then printed out to the STDERR stream. The assembly looks something like this:
There would be a string showing which function failed, something to describe the error (often exposing local compiled out variable names) and the file location of the function. Great information! We could use this to deduce names of functions before the compiler stripped them out.
There are so many of these error handling blocks that we can’t reasonably go through all of them by hand. So, I want to automate the process of mapping the function names from the error messages to the IDAPro disassembly database. To that end, I wrote a couple of IDAPython functions to help me out. These functions will be specific to the MIPS disassembly that I saw in this webserver. However, the logic is probably applicable to other applications as well.
Let’s dive in. First thing I notice is that the function responsible for error reporting uses fprintf this means that the strings I’m seeing are placed into the argument registers. The function below follows that pattern and abuses specifics of the IDA output syntax to extract the information.
Given an address of a call to fprintf the function will trace back several instructions and find all strings that are stored in the arguments. Noticing that the function name is located in the first variable argument to fprintf, I don’t need to worry about the arguments placed placed on the stack. The first loop looks at the addi instructions that refer to the $a0 – $a4 registers. These are the instructions that set the arguments. The second loop looks for references to strings that start with /home since that is where the source code was, apparently, compiled. Together we get a nice picture of what the error message looks like. We can see the name of the current function and its file location. For the assembly shown above, this is the output we get:
OK, given that we can find information about a function with one error message, how we do we find all of them and do the mapping? I like to take an iterative approach by refining the information at hand with each step. This way there’s more space for tweaking and adjusting for accuracy depending on our needs. So, the function below will first look for all uses of fprintf.
Once an fprintf is located, the script will look for la (load address) instructions. That is where the address for the function is loaded before bing used. I chose the la vs the jalr instruction as a means of reducing the number of instructions I have to consider. Then we call the findPrintfStrings function from earlier to extract the strings. Once the strings are extracted we can filter on a common filter for all error messages. We notice that, first, the error specifies the context before moving on to other information. So, we look for “(%s, to remove irrelevant fprintf occurrences.
Finally, using the IDA api we look up the function address that contains the error message and add that to the list. The final output looks something like this:
There are 910 of such error code blocks. Some function mappings are duplicates because a function could contain more than one error block. The duplication is useful for confirming that a mapping is correct.
Discovering internal structures
During the long process of reverse engineering the webserver MIPS assembly, several programming patterns have revealed themselves. First, the webserver is a single threaded state machine processing one HTTP request at a time. This greatly simplified how we reason about the system. Second, the implementation of the server is in C, however the developers are clearly fans of Objective-C or C++. That is because most structures come combined with data and function pointers. To call these functions, the code always passes the structure pointer as the first parameter. However, I do not believe that the source code is in C++ because there are no obvious artifacts, such as mangled names or vtables, to be found anywhere. The following pseudo code is a very common pattern that we see in the assembly:
Let’s find an example of this pattern. In web_cgi_main_handler there’s a call create a structure for the httpd server. It is allocated and initialized at address .text:00412B34 with a call to httpd_new.
Within the httpd_new function a buffer is allocated using calloc.
This buffer is used to store a whole bunch of function pointers (please excuse skipped instructions for brevity – note the addresses).
httpd_new will return a pointer to this new heap allocated structure back to web_cgi_main_handler. The handler function will then be able to call these function pointers to do further operations.
In the example above we can see that $v0 register contains a pointer to the httpd_t structure. Then at offset 0x78 a pointer is retrieved and executed. Before it is executed the first argument at $a0 register is set to the address of the same httpd_t structure. In essence the function is called with a this pointer as the first argument. We see this pattern over and over with various core data structures.
This pattern is really great news for any sort of buffer overflow vulnerabilities on the heap. That is because there are unprotected function pointers all over the place and there would be no need for any sort of heap pointer manipulations to gain execution – your experience may vary ;-).
One way to locate these patterns is to use the function finder script we built is the previous section. All we have to do is filter on *_new for function names.
Using this method we find a whole bunch of initialization functions including the one we analyzed as an example, httpd_new. One unfortunate side affect of this C++ style pattern is that functions called via these structures do not get picked up as cross references by IDA.
I’ve already mentioned that the compiler used by the developers of the device is not an optimizing compiler. Or, at least, if it was optimizing, it wasn’t doing a great job. So let’s have a look at some of the weird patterns that have emerged.
First, every jump or branch is followed by a NOP instruction. This creates for a nice break in code and makes it less dense. I find that it becomes easier to read the assembly code because there are less things that I have to consider. For contrast, here’s a function epilogue from libiconv from the same device:
We can see here that the function returns and automatically fixes up the stack. This allows for a more efficient implementation but it puts a little more load on our brains when doing reverse engineering. OK, let’s switch back to the webserver function.
I find this phenomenon quite often. There will be an instruction that stores a variable and the immediately loads it back up into the same register as if the storing process has somehow cleared the register. Clearly, it is an artifact of a lack of an optimization step. This is fine for us, it allows the reverse engineer to see how the source code was written. Such structures create less dense and more mentally patterned assembly which is easier to follow.
Finally, a bunch of NOPs around branching. The compiler is being safe about the two stage pipeline to make sure that when the registers are used they are in fact fully actualized.
For commands triggered by the user via web UI the server will perform a session validation. This happens at the beginning of each function. For example, there’s a feature for cloning a MAC address. Before the feature is executed a call to cgi_chk_sys_login will be made.
The session check essentially ensures that the token in the cookie matches one that is recorded locally. This is a fairly common practice. It is a little bit concerning that the check is so decentralized, however the server is small in feature and so it should be relatively easy to ensure that every sensitive function performs this authorization check.
A total of 315 references to the check. This is a good way to find out which functions are protected and somehow exposed to authenticated users. It is also good way to identify which function do something sensitive.
We went through some interesting patterns for the implementation of the server. Hopefully, it has given you enough of an intuition for your own reverse engineering efforts in the future. We got really lucky in this case because the compiler was not aggressively optimizing the code and so we got to see a lot of patterns that make reversing of this server much easier.
The binaries referenced in this article can be downloaded here:
- SHA (7620-WiFiDGRJ-HooToo-633-HT-TM06-2.000.030_fileserv) = 86b96a77f8f09ac4937e079e9db1e1e3c9a2d24f
- SHA (7620-WiFiDGRJ-HooToo-633-HT-TM06-2.000.030_ioos) = 7420757f92f140708e4628efe589924cfcc1fade
Apply to join the Synack Red Team. Become one of the few and fully experience our platform – it’s designed by hackers for hackers. If you’re up for the challenge, apply today, and use code “SRTBLOGS” in your application.