Wednesday, February 27, 2008

A Brief Tutorial for MSP430static

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

Recently, I released MSP430static, a tool for reverse engineering MSP430 firmware. This article is a tutorial on the installation and usage of the tool.

Installation


As a development tool under active development, msp430static is very rarely distributed in packaged form. Just grab the latest code from subversion, like so:
mil% svn co https://msp430static.svn.sourceforge.net/svnroot/msp430static msp430static
Once the code has been checked out, install it like so:
mil% cd msp430static/trunk 
mil% sudo make install
ln -s `pwd`/msp430static.pl /usr/local/bin/msp430static
ln -s `pwd`/msp430static.pl /usr/local/bin/m4s
mil%
When installing from the trunk version, links--rather than copies--will be made. This ensures that upgrading is as simple as running 'svn update'.

Once the application is installed, it'll likely be necessary to chase prerequisites. You will need mspgcc. Run 'm4s' to see the error message, then follow the installation procedures for your operating system. The following example is from a Gentoo machine.
mil% m4s
install_driver(SQLite) failed: Can't locate DBD/SQLite.pm in @INC (@INC contains: /etc/perl
/usr/lib/perl5/site_perl/5.8.6/i686-linux /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.6/i686-linux /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl
/usr/lib/perl5/5.8.6/i686-linux /usr/lib/perl5/5.8.6 /usr/local/lib/site_perl .) at (eval 2) line 3.
Perhaps the DBD::SQLite perl module hasn't been fully installed,
or perhaps the capitalisation of 'SQLite' isn't right.
Available drivers: DBM, ExampleP, File, Proxy, Sponge, mysql.
at /usr/local/bin/m4s line 297
mil% sudo emerge dev-perl/DBD-SQLite
...
mil% m4s
mil%
Once you've chased all of the absolute prereqs, the command with no arguments should merely return. At this point, you can begin to play with the tool.

Organization


Before we use the tool, it might be helpful to explain the manner in which it is used. MSP430static is built as a Perl script which wraps an SQLite3 database. It is called either from the unix shell or from its own shell, and a lot of work can be performed by macros and subs. Macros are short, parameter-less blocks of code written in Perl, shell script, or SQL. Subs are SQL functions which are written in perl, but other languages will be added soon. (I'm writing this before feature-creep sets in. Check the formal documentation for whatever is new.)

The database file resides in the current working directory and is always named 430static.db. The database contains tables (code,funcs,symbols) for managing a working codebase. Other tables (lib) store library functions for symbol identification. Others (macros, subs) store macros and subroutines which extend the command language of the interpreter.

Usage


Let's begin analyzing some code. In my case, I'll begin my database with the TinyOS Blink example.
mil% msp430-objdump -D /opt/tinyos-2.x/apps/Blink/build/telosb/main.exe | m4s init
mil% m4s .summary
/home/travis/svn/msp430static/trunk/msp430static.pl
1000 instructions
33 functions from 1100 to 4a64
0 of 0 library functions found
73 distinct memory locations are poked.

0 lib functions.
0 unique lib function names.
0 unique lib function checksums.
mil
This initializes the database with the dumped executable code, then calls a database summary. As this example includes symbol names, I need no library functions. Just to show off the library features, let's see which functions from libc exist here.
mil% m4s .lib.import.gnu >>/dev/null
mil% m4s .summary
/home/travis/svn/msp430static/trunk/msp430static.pl
1000 instructions
33 functions from 1100 to 4a64
1 of 2099 library functions found
73 distinct memory locations are poked.

2099 lib functions.
587 unique lib function names.
543 unique lib function checksums.
mil%
Now that we know a function has been found, let's take a look at it.
mil% m4s shell
m4s sql> .funcs.inlibs
4118 __clear_cache
m4s sql> select asm from funcs where address=dehex('4118');
4118: 30 41 ret

m4s sql> select name from funcs where address=dehex('4118');
Msp430TimerP$1$Event$default$fired
m4s sql>
Using the msp430static shell in the example above, the first query lists all recognized functions and their hexadecimal address. The second grabs the code of that function. The third grabs the symbol name that shipped with the executable. It's clear that this recognition is merely coincidence, Msp430TimerP$1$Event$default$fired is mistaken for __clear_cache as both perform the exact same thing: A simple return. Let's drop the GNU stuff and load only the TinyOS files.
mil% m4s shell
m4s sql> delete from lib;
m4s sql> lib.import.tinyos
...
#887 lib functions.
#213 unique lib function names.
#213 unique lib function checksums.
m4s sql> .funcs.inlibs
403a __ctors_end
403e _unexpected_
404c __nesc_atomic_start
4060 __nesc_atomic_end
4084 Msp430TimerCapComP$0$Event$fired
4094 Msp430TimerCapComP$1$Event$fired
40a4 Msp430TimerCapComP$2$Event$fired
4118 Msp430TimerP$1$Event$default$fired
43a4 Msp430TimerP$1$Timer$get
454c Msp430ClockP$set_dco_calib
4568 MotePlatformC$TOSH_FLASH_M25P_DP_bit
4118 SchedulerBasicP$TaskBasic$default$runTask
4a62 __stop_progExec__
41da SchedulerBasicP$TaskBasic$postTask
432e TransformCounterC$0$Counter$get
473e TransformAlarmC$0$Alarm$startAt
477c AlarmToTimerC$0$fired$runTask
47c0 VirtualizeTimerC$0$Timer$startPeriodic
4986 SchedulerBasicP$TaskBasic$runTask
499e McuSleepC$getPowerState
m4s sql>
This time, many more functions are identified. If the Blink executable still remains, they ought to all be recognized. Supposing that you sell a proprietary library for the MSP430, msp430static makes it trivial to catch copyright violators. By running loading suspect firmware and calling the .funcs.inlibs macro, in seconds you can determine whether your library is being used.

Let's try things from the other side of the fence, though. Suppose you have a firmware that you'd like to reverse engineer. How much can we determine without symbol names? A callgraph, such as this one, is easy enough to generate by the .callgraph.* macros. .callgraph.xview or .callgraph.kgv will display the graph, and my PDF was generated by the following:
karen% m4s .callgraph.ps >foo.ps
karen% ps2pdf foo.ps
karen%

Supposing I wanted to see what an attacker could determine of my application, knowing only the standard libraries but nothing of my source code, I could run the following:
karen% m4s shell
m4s sql> update funcs set name='unknown';
m4s sql> .symbols.recover
m4s sql>
karen%
The callgraph, available here, shows that some, but not all, of the functions are identified. (In practice, the example projects ought to be completely identified. Any private functions, not imported by the script, will not be shown.)

It's also important to note that function inlining can make fingerprinting difficult. As TinyOS inlines functions by default, the same function might be inlined in one example and not in another. (At present, inlined functions cannot be automatically recognized.)

Macros and Subs


I haven't room here to enumerate all the features of msp430static, but luckily it will enumerate them for you. The macro .macros will list all macro names and comments. .subs will list all subroutines and comments.
karen% m4s .macros
.callgraph Dump a digraph call tree for graphviz.
.callgraph.gv View a callgraph in ghostview.
.callgraph.kgv View a callgraph in kghostview.
.callgraph.lp Print callgraph for US Letter.
.callgraph.ps Postscript callgraph, sized for US Letter.
.callgraph.xview View a callgraph in xview.
.code.switches List branches belonging to jump-table switch statements.
.export.aout Dumps the project an a.out executable.
.export.ihex Dumps the project as an Intel Hex file.
.export.srec Dumps the project as a Motorolla SRec file.
.funcs.inlibs List functions which appear in libraries.
.funcs.outside List instructions where are not part of any function.
.funcs.overlap List overlapping function addresses.
.lib.import.gnu Import mspgcc libraries from /usr/local/msp430.
.lib.import.tinyos Import mspgcc libraries from /usr/local/msp430.
.macros Lists all available macros.
.memmap.gd.eog View a callgraph in Eye of Gnome.
.memmap.gd.gif Output a GIF drawing of memory.
.memmap.gd.jpeg Output a JPEG drawing of memory.
.memmap.gd.png Output a PNG drawing of memory.
.memmap.gd.xview View a callgraph in xview.
.memmap.pstricks Output a LaTeX drawing of memory.
.missing Default macro, run whenever a missing macro is called.
.subs Lists all additional SQL functions.
.summary Output a summary of the database contents.
.symbols.recover Recover symbol names from libraries.
karen% m4s .subs
addr2func Returns the starting address of the function containing the given address.
addr2funcname Returns the name of the function containing the given address.
callgraph Returns a graphviz callgraph.
dehex Converts a hex string to a numeral.
enhex Converts a numeral to a hex string.
fprint Position-invariant fingerprint of an assembly code string.
to_ihex Returns a line of code as an Intel Hex entry. [broken]
karen%
These are just rows in the database, so new subs and macros may be written from SQL. The source code to a sub or macro may be called by a SELECT statement. (A few of these call functions in msp430static.pl.)

In the following example, I add a new macro function which lists the functions which have not been identified in the library.
m4s sql> select * from macros where name like '.funcs.inlibs';
.funcs.inlibs
sql
List functions which appear in libraries.
select distinct enhex(f.address), l.name from lib l,funcs f where f.checksum=l.checksum;
m4s sql> select distinct enhex(f.address),f.name from funcs f where
f.checksum not in (select checksum from lib);
43dc unknown
411a unknown
4230 unknown
48fe unknown
4810 unknown
4580 unknown
45ce unknown
4686 unknown
4886 unknown
40b4 unknown
4068 unknown
43b8 unknown
40fa unknown
4000 unknown
m4s sql> insert into macros values('.funcs.notinlibs',
'sql',
'List functions which do not appear in libraries.',
'select distinct enhex(f.address),f.name from funcs f where
f.checksum not in (select checksum from lib);';
m4s sql> .funcs.notinlibs
43dc unknown
411a unknown
4230 unknown
48fe unknown
4810 unknown
4580 unknown
45ce unknown
4686 unknown
4886 unknown
40b4 unknown
4068 unknown
43b8 unknown
40fa unknown
4000 unknown
m4s sql>
Macros may be written in perl, sql, or unix shellscript. Subs work similarly, but there's only perl support at the moment.

Further Usage


This ends the tutorial, but you should play around with the macros and subroutines further. Try writing a few of your own, and email them to me if they're interesting.

Tuesday, February 26, 2008

MSP430static on SourceForge, TIDC

I'll be releasing msp430static, my Perl tool for reverse engineering of MSP430 firmware, on SourceForge after my session at the Texas Instruments Developer's Conference in Dallas. The session is in the Manchester Room at two o'clock on Wednesday afternoon. My talk primarily concerns the injection of machine code into MSP430-based wireless sensor nodes, but I'll be doing a short demonstration of the tool following my talk for those that are interested.

UPDATE: The code has been posted at SourceForge. See msp430static.sourceforge.net.

Wednesday, February 20, 2008

Self-propagating Packets in Harvard Sensor Networks

Qijun Gu and Rizwan Noorani at the CS Department of Texas State University at San Marcos have developed a ``mal-packet,'' which rebroadcasts itself upon reception by a Mica2. This is interesting because the mal-packets target a Harvard architecture machine, which has separate memories for code and data. As they are unable to execute the packet in data memory as code,they instead set up the stack to call a library function for rebroadcasting, similarly to how a return-to-libc attack would operate in Unix. Consult their WiSec '08 submission for more details.

Such an attack is important because it shows that the MSP430, which I crafted an overflow for in this article, is not the only sensor platform that's vulnerable to attack. It's not necessary for a node to be able to execute arbitrary code to cause a packet retransmission, being able to call existing code with arbitrary parameters is sufficient.

Monday, February 18, 2008

Switch/Case Headaches in MSP430 Assembly

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

While polishing off my rewrite of msp430static, my function identifier ran into a bug which was the result of an improperly-handled switch/case statement. This short article is intended to show a practical example of the mixing of code and data in von Neumann machines, as well as what a headache variable-length instructions can be.

This article will concern the meaning of the following slice of object code and that which follows it, found within the Msp430TimerP$1$Event$fired method of TinyOS 2.x Blink example. You can find the associated executable and disassembly at http://frob.us/~travis/public/blog/misc/switchcase/.

Consider the following fragment of code:
    4124:       10 4f 28 41     br      16680(r15)              ;
4128: 38 41 pop r8 ;
412a: 68 41 mov.b @r1, r8 ;
412c: 78 41 pop.b r8 ;
412e: 88 41 98 41 mov r1, 16792(r8);
4132: a8 41 b8 41 mov @r1, 16824(r8);
4136: ...
What does this code accomplish? What is the meaning of the POP statement at 0x4128? Try it yourself before reading ahead.

The answer is simple. There is no POP instruction, neither at 0x4128 nor anywhere else in the code above! 0x4128 is the first entry of a jump table, which continues past the end of the excerpt. 0x4124 uses the indexed addressing more. `BR 16680(r15)' is a branch to the address contained within the word at 16680+r15. 16680--as you can find by a calculator or by reading the second word of the object code--is 0x4128, the address of our POP instruction.

It's easy to reconstruct the table by reading the object code, correcting for endianness. The fragment shown above is {4138, 4168, 4178, 4188, 4198, 41a8, 41b8, ...}. Note not only that the disassembler is unable to recognize that the table is not code, but also that the disassembler is unable to determine where words begin and end. Continuing the code, we find that the list terminates in the following manner:
    4136: c8 41 1f 42  mov.b r1, 16927(r8);
413a: 82 01 .word 0x0182; ????
413c: 8f 10 swpb r15 ;
The word at 413a is not properly disassembled because it is neither an element in the list nor an instruction. Rather, it is the second word of a 4-byte instruction. This instruction is "1f 42 82 01" or "0x421f 0x0128", depending upon your choice of notation. The MSPGCC project's handy python disassembler reveals that the instruction is "mov &296, R15" where 296=0x0128.