Sunday, January 13, 2008

MSP430simu and LaTeX, part 2

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

After writing my previous article, Tracing with MSP430simu, LaTeX, and PowerPoint, I found that I had left a few things unsaid. I'll cover them here, refraining from reiterating the basics.

addr2line


An essential part of any debugger that will be used by mere mortals--which includes a reverse engineer when working on anything other than his pet project--is the ability to print function names instead of hexadecimal addresses. In unix, this is accomplished by addr2line, which may be called thusly:
karen% msp430-addr2line -e overflow.elf 01182 -f -s
myfn
mysource.c:15
karen%
This tells me that 0x1181 is a machine-language instruction from line 15 of mysource.c, which is somewhere within the function myfn(). In my franken-python, I grab function name and file/line number with
    def addr2line(self,addr):
import os
app = os.popen("msp430-addr2line -e overflow.elf 0x%04X -f -s " % addr, "r");
text = app.read();
app.close();
return text;
Note that I hardwired the executable filename, application, and platform. This is very bad practice, but as I warned in my first article, this is a quick hack to generate conference slides. Use something better in your own implementation. (Even if you just need to generate conference slides of your own.)

stack traces


Printing a stack trace is just as easy, if we make the--perhaps incorrect--assumption that a stack trace is just a list of pointers that begins at the top of RAM and grows downward until the address contained within the SP (Stack Pointer) register. By use of addr2line(), it isn't difficult to get a human-readable stack trace.

def stacktrace(self,sp):
s=int(sp);
trace='';
if int(sp)>0x200: #make sure it's in ram
for l in range(0xA00, s-2, -2): #scaled from sp to top of ram
fn=self.addr2line(self.getint(l));
trace+=("0x%04X %s" % (l, fn));
trace+="SP %s" % self.addr2line(int(self.PC));
return trace;


This code is supposed to count each even line in the range [0xA00,s], printing each address whose contents is a function name. I'm unfamiliar with Python.

_reset_vector__


The reset-vector loads RAM with values from ROM. This is essential in a real system, as you'll want an application to run again after RAM has cleared, but it's terribly inconvenient when trying to view an execution trace. The first several frames will be nothing but globals being initialized. For this reason, I added a simple if() statement that refuses to print a frame in batch mode if -1==str.find(fn,"reset_vector").

By searching for _vector(), it's possible to drop all vector handlers, though I've only encountered _reset_vector__ in this project.

Stack Variables


The above is all well and good if the stack contains only points of execution; however, my presentation required that stack variables be shown. Good heavens, how can that be done? Trying addr2line on 0x200 gives:
karen% msp430-addr2line -e simu/overflow.elf 0x202 -f -s
__data_start
??:0
karen%
Thus, any entry that's a pointer to a global variable will give __data_start as its function name, even though it can't supply a line number. I get more luck with
karen% nice msp430-objdump -g simu/overflow.elf | grep 0x202
char foo[12]:uint16 /* 0x202 */;
karen%


As for the question of what data-type is on the stack, I have yet to come up with an adequate solution. I could run objdump into a database, but I'm still left the problem of determining whether the 0x0202 on the stack is a pointer to foo[], an integer, two characters, etc. Expect a third installment of my msp430simu series detailing a solution, but with my slides due in twenty-two hours and my coffee-tin empty, I'll have to overlook it for this draft.

(I'll likely solve this by watching for PUSH and POP instructions. This would let me see the difference between a local variable and a function call.)

As mentioned in my prior article, the presentation has to create a new frame whenever a watched variable is changed. I'm trying to demonstrate a stack overflow, so it's essential that a presentation frame be generated when the stack changes. As such, my actual demonstration prints more than that which is presented here. I print the function name as "RAM" for anything less than the SP, "STACK" for anything greater than that but less than 0x0A00.

A sample stack dump slide block--prior to formatting--follows:
stackdump screenshot

Hacking the Debugger!


At some point, I made a modification that conflicts with the debugging framework that's included with msp430simu. Rather than try to reconcile the changes, I just discontinued use of it. Note that TEST() and END_TEST must still be called in main() to keep the function from dying.

What could be cooler, in a hacking demo, than to have the hack mess with the debugger from inside of a simulation? Looking at test_puts(), you'll find a while-loop that copies a string to TEST_TEXTOUT. To call it from assembly, just load the first character's address in R15 and jump to the address of test_puts, which is 0x1140 in my present revision but likely won't remain that for long. In machine language, this can be accomplished in no more than eight bytes: four to load the string's address in R15, and four to jump to the function. (Assuming, of course, that the fixed addresses are known.) Other hacks are certainly possible. Try disassembling some functions with msp430-objdump foo.elf -d | less to see what you can come up with.

Commentary


Example slide
As depicted above, my slides now properly render, showing most of what's needed from the machine. Only two items remain: commentary and section titles. These two features might have been implemented by a better stack analyzer, but my solution was to specify the slide section through a watched variable. A 16-bit integer is set at the entrance to a function which I'd like to watch, which is the index of a string in my perl script. Commentary is then loaded by calling \input{} in LaTeX on the appropriate include file, which contains anything I would like to be in the left box.

Conclusion


A snazzy draft which simulates a stack overflow attack is available as msp430simu_tidc08.pdf. As with all of my presentations, it makes little sense without spoken commentary, but it's a good example of what can be done with a little bit of work and a CPU simulator.

No comments: