11 November 2014

10 instructions, 1 bug!

Patrick Wardle

They say that the devil is in the details. Now, while the OS X kernel bug described here can hardly be classified as ‘devilish’ (it’s rather prosaic), it does illustrate how seemingly simple hand-coded assembly routines can contain flaws. Since Apple has decided not to fix this bug, let’s take a peek at it – because who doesn’t like spelunking around the kernel looking at buggy code?

While the majority of the kernel proper is written in C, some parts of OS X (XNU) are written in assembly. There are a variety of situations where it makes sense to use hand-written assembly (for example direct hardware access or optimizations). Looking at the source code for XNU reveals several memory copy routines that have been implemented in assembly for optimization reasons. _bcopy, implemented within locore.s is one such hand-coded assembly routine. Though quite simple, it contains a subtle bug.

The _bcopy function is implemented within locore.s, and is used to “copyin/out from user/kernel address space”. It’s basically analogous to memcpy.

_bcopy is invoked from within the copyio function in order to handle various copy operations. Looking at copyio.c, the source code file where copyio is implemented, _bcopy is declared in the following manner:







Figure 1. _bcopy’s declaration in copyio.c

Then _bcopy is invoked in order to copy data from one buffer to another:


Figure 2. copyio function invoking _bcopy

Examining the _bcopy code reveals its simplicity; it’s only 10 assembly instructions!


Figure 3. _bcopy (Yosemite)

In terms of _bcopy’s parameters, comments in the source code state that RDI is the source address, RSI the destination address, and RDX the byte count. From these comments, one can surmise that RDX bytes will be copied from RDI into RSI.

Let’s take a closer look at the disassembly. As is shown in the previous figure, first RDI and RSI are swapped, then the direction flag is cleared. Following this EDX is moved into ECX, which is then shifted right by three. This sets up rep MOVSQ which will copy some number (ECX) of ‘quadwords’ from RSI to RDI. Any remaining bytes are then copied via REP MOVSB. Following this, the code zeros out EAX and returns. These series of instructions should look quite familiar to reverse engineers who have seen disassemblies of inlined-memcpys.

So did you spot the bug? Though it was initially uncovered via single-stepping thru the kernel, looking at the source code reveals a ‘TODO’ that points towards the bug as well:


Figure 4. ‘TODO’ in _bcopy’s source

On all modern Macs (x86_64 systems), the ‘number of bytes to copy’ variable (passed in thru RDX) is truncated into 32 bits. This occurs on the third line of the source code, where the instruction movl %EDX, %ECX uses 32-bit registers to load the size variable into ECX.

Recall that _bcopy’s declaration states the size variable is of type vm_size_t and that the copyio function invokes _bcopy with the ‘nbytes’ argument (which is also declared as a vm_size_t). On x86_64 systems, vm_size_t is 64 bits in size…not 32 bits. In other words, code that invokes _bcopy expects the ‘number of bytes to copy’ variable to be treated as a 64-bit value – not truncated down to 32 bits. Truncation is generally a bad thing and in C source code, will generate compiler warnings. However hand-coded assembly may not generate such warnings and thus, such issues may be overlooked.

In theory this bug could lead to issues where data is only partially copied or not fully overwritten/zero-out (info leak anyone?).


Figure 5. copy fail!

On x86_64 systems (i.e. all modern Macs), the previous code will copy exactly zero bytes (i.e. nothing). Why? Recall that calling copyio() invokes the buggy implementation of _bcopy. As previously mentioned due to our bug, the nBytes (size) variable will be truncated to 32 bits when movl %EDX, %ECX is executed. Since 2^32 (or 0x100000000) cannot ‘fit’ into 32 bits, the top bits will be chopped off leaving zero bytes (0x000000000) in the ECX (and RCX) register. Thus when rep MOVSQ is executed (which uses RCX to determine how many bytes to copy), nothing will be copied.

This bug was reported to Apple in December of 2013. Although Linus thinks all bugs are equal, apparently Apple does not. The buggy code is still found within the Yosemite kernel. To fix the bug, the 64-bit versions of the registers should be used. Specifically, line 3 of _bcopy should be replaced with movq %RDX, %RCX. By using the 64-bit versions of the registers no bytes are truncated. With this simple fix in place, the code will always execute as expected: the correct number of bytes will be copied.

Interestingly, Apple did get things right in bcopy (note the missing leading underscore in this name) implemented in the file bcopy.s:


Figure 6. correct; bcopy

Since bcopy is correctly implemented, Apple could avoid the bug altogether, by simply removing _bcopy and updating code to instead invoke bcopy (for example in copyio()).

Writing routines in assembly code can provide optimizations. However, it is error prone and difficult to maintain. In the OS X kernel one such routine, the humble _bcopy has a bug. While this bug isn’t likely a security issue, (and unlikely to come to fruition under normal circumstances due to buffer size limitations) anytime code within the kernel is buggy, bad things could possible happen.

Even though it’s a one-line fix, will this bug ever be eradicated? Who knows…