Monday, February 18, 2008

Switch/Case Headaches in MSP430 Assembly

by Travis Goodspeed <travis at utk.edu>
at the Extreme Measurement Communications Center
of the Oak Ridge National Laboratory

While polishing off my rewrite of msp430static, my function identifier ran into a bug which was the result of an improperly-handled switch/case statement. This short article is intended to show a practical example of the mixing of code and data in von Neumann machines, as well as what a headache variable-length instructions can be.

This article will concern the meaning of the following slice of object code and that which follows it, found within the Msp430TimerP$1$Event$fired method of TinyOS 2.x Blink example. You can find the associated executable and disassembly at http://frob.us/~travis/public/blog/misc/switchcase/.

Consider the following fragment of code:
    4124:       10 4f 28 41     br      16680(r15)              ;
4128: 38 41 pop r8 ;
412a: 68 41 mov.b @r1, r8 ;
412c: 78 41 pop.b r8 ;
412e: 88 41 98 41 mov r1, 16792(r8);
4132: a8 41 b8 41 mov @r1, 16824(r8);
4136: ...
What does this code accomplish? What is the meaning of the POP statement at 0x4128? Try it yourself before reading ahead.

The answer is simple. There is no POP instruction, neither at 0x4128 nor anywhere else in the code above! 0x4128 is the first entry of a jump table, which continues past the end of the excerpt. 0x4124 uses the indexed addressing more. `BR 16680(r15)' is a branch to the address contained within the word at 16680+r15. 16680--as you can find by a calculator or by reading the second word of the object code--is 0x4128, the address of our POP instruction.

It's easy to reconstruct the table by reading the object code, correcting for endianness. The fragment shown above is {4138, 4168, 4178, 4188, 4198, 41a8, 41b8, ...}. Note not only that the disassembler is unable to recognize that the table is not code, but also that the disassembler is unable to determine where words begin and end. Continuing the code, we find that the list terminates in the following manner:
    4136: c8 41 1f 42  mov.b r1, 16927(r8);
413a: 82 01 .word 0x0182; ????
413c: 8f 10 swpb r15 ;
The word at 413a is not properly disassembled because it is neither an element in the list nor an instruction. Rather, it is the second word of a 4-byte instruction. This instruction is "1f 42 82 01" or "0x421f 0x0128", depending upon your choice of notation. The MSPGCC project's handy python disassembler reveals that the instruction is "mov &296, R15" where 296=0x0128.

1 comment:

Travis Goodspeed said...

Note that while GCC uses 16-bit absolute addresses, other compilers are free to implement their tables differently. The MSP430 serial bootstrap loader, for example, uses 8-bit offsets. An integer will be loaded into r5, for example, and then mov.b 8974(r5),r5 will load a byte from the table beginning at 8974 into r5. Finally, an add r5,r0 statement will be used to add the offset to the program counter.