Wednesday, January 23, 2008

Static Analysis of MSP430 Firmware in Perl

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

That which follows is an adaption of notes which I made during the course of writing msp430static, a sort of poor man's IDA Pro for static analysis of MSP430 firmware without source code.

What Functions Look Like

A call to strcpy, such as the one which follows, is accomplished by populating r15 with the destination address and r14 with the source address, then calling the function at its hex address. In the following example, foo is the target (r15) and babe is the source (r14). See my article on IAR's MSP430 calling conventions for references to calling convention documentation for various compilers, as each compiler seems to do something different on this platform.
strcpy(foo,babe);                                                                                                                                                     
1154: 3e 40 7a 02 mov #634, r14 ;#0x027a
1158: 0f 44 mov r4, r15 ;
115a: b0 12 a4 11 call #4516 ;#0x11a4

In the unstripped binary, we'll find the code for strcpy at the address (0x11a4) called above:
000011a4 <strcpy>:
11a4: 0d 4f mov r15, r13 ;
11a6: 0c 4f mov r15, r12 ;
11a8: 6f 4e mov.b @r14, r15 ;
11aa: cd 4f 00 00 mov.b r15, 0(r13) ;
11ae: 4f 93 cmp.b #0, r15 ;r3 As==00
11b0: 07 24 jz $+16 ;abs 0x11c0
11b2: 1e 53 inc r14 ;
11b4: 1d 53 inc r13 ;
11b6: 6f 4e mov.b @r14, r15 ;
11b8: cd 4f 00 00 mov.b r15, 0(r13) ;
11bc: 4f 93 cmp.b #0, r15 ;r3 As==00
11be: f9 23 jnz $-12 ;abs 0x11b2
11c0: 0f 4c mov r12, r15 ;
11c2: 30 41 ret

The stripped binary has the function at the same address, but has no function label. In fact, there isn't even a note (in msp430-objdump) that the address is the beginning of a function.
   11a4:       0d 4f           mov     r15,    r13     ;                                                                                                               
11a6: 0c 4f mov r15, r12 ;
11a8: 6f 4e mov.b @r14, r15 ;
11aa: cd 4f 00 00 mov.b r15, 0(r13) ;
11ae: 4f 93 cmp.b #0, r15 ;r3 As==00
11b0: 07 24 jz $+16 ;abs 0x11c0
11b2: 1e 53 inc r14 ;
11b4: 1d 53 inc r13 ;
11b6: 6f 4e mov.b @r14, r15 ;
11b8: cd 4f 00 00 mov.b r15, 0(r13) ;
11bc: 4f 93 cmp.b #0, r15 ;r3 As==00
11be: f9 23 jnz $-12 ;abs 0x11b2
11c0: 0f 4c mov r12, r15 ;
11c2: 30 41 ret
It's easy enough to detect the presence of this code in a stripped executable by looking for "mov r15, r12" or "0x0c 0xf4" and comparing the bytes that follow. I can't stress enough the importance of endian-awareness: the second column is composed of bytes, not words. As a word, "mov r15,r12" is 0xf40c. When in doubt, double-check yourself with the Single Line MSP430 Assembler.

Note that calling conventions vary considerably across the many MSP430 compilers and even among versions of the same compiler, depending upon optimization options and inlining. Don't expect all calls to look like this: check for yourself.

Before looking at a decompilation of the above, notice that a reasonably large string of bytes {6f 4e, cd 4f 00 00, 4f 92} appears twice. This duplicity might be removed by another optimizer, but it shows that something in the code is sufficiently intrinsic to the function to appear twice in one function. Perhaps it will remain consistent across compilers? In point of fact, this expanse of code copies a byte from the address contained within r14 to the address contained within r13. The final word compares the byte that was copied to zero. In the first usage, the function jumps to the end in the event that the comparison is zero. In the second usage, which follows the incrementing of both r14 and r15, the jump is backward if the comparison is not zero. A rough approximation in psuedo-C follows
char* strcpy(char* dest, char* src){
a=dest; //mov r15, r13
b=dest; //mov r15, r12

c=*src; //mov.b @r14, r15
*a=c; //mov.b r15, 0(r13)
if(c==0) //cmp.b #0, r15
goto ret; //jz $+16

do{
src++; //inc r14
a++; //inc r13
c=*src; //mov.b @r14, r15
*a=c; //mov.b r15, 0(r13)
}while(c!=0) //cmp.b #0, r15
// jnz #-12

ret:
return b; //mov r12, r15
} //ret
In the decompilation, I refer to r15 as both dest and c, as its purpose changes completely. Variables are passed as dest=r15 and src=r14, as GCC allocates parameters in the order r15, r14, r13, r12. The result, for strcpy() the destination address, is returned in r15.

It is apparent that this could be written a bit more compactly by merging the first and second stanzas. Also, the use of the indirect auto-increment addressing mode (As=11, of the form @Rn+) could eliminate the "inc r14" line. The instructions might also be reordered, and any number of register combinations might be used to hold intermediate values. It's not possible to detect all the ways in which strcpy() might be implemented, but it shouldn't be too difficult to detect the different ways in which it will be implemented. After all, it's far easier to fix an overflow vulnerability than to hide it; is it not?

Testing my theory, I disassembled the same program, this time compiled with IAR's compiler (V4.09A/W32). Grepping for cmp.b yielded a single line, at address 0xF86E, in the codeblock which follows.
   f864:       0f 4c           mov     r12,    r15     ;                                                                                                               
f866: 0e 4c mov r12, r14 ;
f868: 1c 53 inc r12 ;
f86a: fe 4d 00 00 mov.b @r13+, 0(r14) ;
f86e: ce 93 00 00 cmp.b #0, 0(r14) ;r3 As==00
f872: f9 23 jnz $-12 ;abs 0xf866
f874: 0c 4f mov r15, r12 ;
f876: 30 41 ret
Those that have read my article on the register usage of IAR will note that the ABI is different in the code sample above. IAR fixed the register allocation order in October of 2007, and it now allocates registers in the order r12, r13, r14, r15.

This code is heavily--but imperfectly--optimized, so it's a bit difficult to decompile by hand. It all becomes clear when you realize that r12 is post-incremented and the original value is loaded into r14, the destination address for each character. Unlike GCC, the indirection post-increment addressing mode is used, but on the very next line we see that this necessitates another RAM access! Perhaps the cache will take care of it, but this means that IAR makes three memory accesses--one write and two reads--for every two that GCC makes. I'd recommend hand optimization for this function, if my stronger recommendation wasn't to scrap it as a troublemaker.

The decompiled code follows,
char* strcpy(char* dest, char* src){
char *a,
*b=dest; //mov r12, r15
do{
a=dest; //mov r12, r14
dest++; //inc r12
*(a++)=*src; //mov.b @r13+, 0(r14)
}while(0!=*src); //cmp #0, 0(r14)
//jnz $-12
return b; //mov r15, r12
//ret
}
I'd be willing to bet that the original is quite a bit denser in C, but this ought to be easy enough to understand.

So how do we do a generalized search for this, one which will recognize most implementations by most compilers? I propose a pattern that looks for the following:
  1. The use of two registers, source pointer SRC and destination pointer DEST.
  2. A mov.b instruction with SRC as the source. (Call the destination FOO)
  3. A mov.b instruction with DEST as the destination. (Call the source BAR.)
  4. A cmp.b instruction involving the immediate zero and the register SRC, DEST, FOO, or BAR.
I don't demand an inc for SRC as it might be auto-incremented (@Rn+), and I don't demand one for DEST as it might be copied from another variable in an IAR post-increment (cvar++).

It's possible to add more rules which describe the preceding examples. For example, both of these examples move their first parameter to a temporary register and, later, move it back. Both follow the cmp.b with a jnz. I advise against making any matching pattern too strict, as it'll result in false negatives. Keeping things loose might result in false positives, but those false positives will be fertile ground for exploits of their own, even if they aren't strcpy().

It's also worth noting that a ruleset that's complex is easy to sneak by, either intentionally or accidentally. Suppose this pattern were modified to exclude strncpy(). The following strcpy() implementation would skate by, undetected.
char *strcpy(char *dest, char *src){
return strncpy(dest,src,0x1000);
}
By keeping rules loose--but perhaps prioritized--it's easy to catch such actions. After all, what byte-wise copying until reaching zero is not suspicious?

Recognizing Functions from Perl

Now that the hand analysis is complete, it's time to bring perl into the mix. Instructions are recognized as one of two types: code and IVT entries. I ignore the .data section for now, but a little tweaking of the regular expressions would make it match. I make the assumption that every function begins after a 'ret' and ends with a 'ret'. This isn't strictly true, but it suffices for this article and ought only to miss the first function in memory, assuming everything is built with C.

The first step is to recognize individual lines. I used the following regular expressions in an early revision:
Match an instruction:
# 11b6: 6f 4e mov.b @r14, r15 ;
# 11b8: cd 4f 00 00 mov.b r15, 0(r13) ;
# 1111: 22222222222 33333 44444444444444 555555
/\s(....):\s(.. .. .. ..)\s\t([^ \t]{2,5})\s(.*);?(.*)/
Match an IVT entry:
# fffe: 00 11 interrupt service routine at 0x1100
/[\s\t]*(....):[\s\t]*(.. ..)[\s\t]*interrupt service routine at 0x(....)/
Although I don't strictly need to parse so much detail to recognize strcpy(), it will be helpful when I add features.

Once lines are recognized, they are loaded into a list of strings, indexed by the integer (not hex-string) value of the first field. I make a list of strings, rather than objects, because most comparisons can be performed by regular expressions. This is fine for a 16-bit microcontroller, but might be prohibitively expensive for something larger.

Routines are recognized--as I've previously stated--by assuming that they reside between ret statements. This assumption makes things quite easy to implement, but results in the loss of the first function as well as the concatenation of functions--such as main()--which do not return. In the following example main [118E to 11A0] and strcpy [11A4 to 11C2] are combined into a single listing:
   118e:       31 40 00 0a     mov     #2560,  r1      ;#0x0a00
1192: 04 41 mov r1, r4 ;
1194: 92 43 00 02 mov #1, &0x0200 ;r3 As==01
1198: b0 12 40 11 call #4416 ;#0x1140
119c: b0 12 68 11 call #4456 ;#0x1168
11a0: 30 40 c4 11 br #0x11c4 ;
11a4: 0d 4f mov r15, r13 ;
11a6: 0c 4f mov r15, r12 ;
11a8: 6f 4e mov.b @r14, r15 ;
11aa: cd 4f 00 00 mov.b r15, 0(r13) ;
11ae: 4f 93 cmp.b #0, r15 ;r3 As==00
11b0: 07 24 jz $+16 ;abs 0x11c0
11b2: 1e 53 inc r14 ;
11b4: 1d 53 inc r13 ;
11b6: 6f 4e mov.b @r14, r15 ;
11b8: cd 4f 00 00 mov.b r15, 0(r13) ;
11bc: 4f 93 cmp.b #0, r15 ;r3 As==00
11be: f9 23 jnz $-12 ;abs 0x11b2
11c0: 0f 4c mov r12, r15 ;
11c2: 30 41 ret
This happens because main() returns not by "ret" but by branching to 0x11C4, which is __stop_progExec__ in the firmware being analyzed. An alternate method would be to look for call targets, assuming that 0x11A4 is the beginning of a function because some other instruction calls it.

By searching by call targets, my script correctly recognizes the second function of the preceding example, but it no longer recognizes main(), which in GCC is called by "BR #addr" and not "CALL #addr". A quick check on a small GCC program shows that absolute jumps are only used for main() and non-user functions. Thus, by looking for "CALL #addr" and "BR #addr", it is possible to find the entry points of most if not all functions.

Once functions can be been identified, it isn't very difficult to add an output mode for Graphviz. The following image is a call tree in which main() calls two functions which call strcpy(). Dangerous functions and calls are labeled in red. The two islands on the right--which prevent this from being a Tree in the graph theory sense--exist in assembly as infinite loops.

Further, it's also useful to produce memory maps which detail memory usage. These can be produces from the database by dumping to a graphics programming language. My first revision published to LaTeX/PSTricks. This looks beautiful, but rendering everything as vector art quickly makes a complex memory map unmanageable. My solution was a rewrite that prints raw postscript. Both are shown below.

Conclusion

I've named the tool msp430static, and I intend to publish a revision as soon as I clean up the code. It's a decent hack at this point, but a hack isn't maintainable and I shudder to think at how I'll comprehend these few hundred lines of perl in three months' time without mush revision.

My redesign will feature an SQL backend, such that the input file needn't be reparsed for each minor revision. This will also allow for scripting in languages other than perl. A single command will stock a database of a defined schema, a second will analyze the database, and others will produce output or analysis. I intend to do most analysis in a self-contained perl script, but clients may be written in a variety of languages as appropriate. I'm undecided as to whether I'll make the tool architecture-agnostic in this revision. It's possible, but perhaps that's more appropriate for a later revision. Potential clients include a modified msp430simu and a GTK# GUI.

I don't intend to make a public release of the present version, but I'll send individual copies by email upon request.

No comments: