Thursday, August 2, 2007

On the IAR MSP430 C Compiler's Inefficient Register Utilization

Recently, I've been digging into the documentation of Texas Instruments' MSP430 micro-controller family. After covering the CPU itself, I continued into the documentation[1] for the mspgcc project, a port of GCC to the MSP430. After realizing that the ABI used for mspgcc had never been defined in the chip's documentation[3], I dug up the manual[2] for IAR's compiler and compared the two.

I quickly discovered that IAR's compiler wastes registers when passing 16-bit parameters to a C function. By its ABI, the first 16-bit parameter is placed into R12 and the second into R14. R13 and R15 remain unused, as they are reserved for the high words of 32-bit parameters. GCC follows the much more logical route of only assigning a single register to a 16-bit value, such that R15 is used for the first parameter, R14 for the second, R13 for the third, and R12 for the fourth. This allows it to accept four parameters by register, while IAR's compiler will push the third and fourth onto the stack while leaving two clobber registers unused!


To demonstrate this, I have compiled a simple C program containing only a function foo() which returned the sum of its four inputs and a main() method to call foo(). This was compiled to assembly language using mspgcc 3.2.3 and IAR MSP430 C/C++ Compiler V3.42A/W32.

In both compilers, four assembly instructions were used to add the values and return the result in the single register of the first parameter, R12 for IAR and R15 for GCC. The table below lists the assembly generated by each compiler, with instructions converted from the GCC format (lowercase, .W omitted) to the IAR format for clear comparison. GCC, by virtue of its more efficient register usage, avoids both having to PUSH.W two parameters onto the stack and avoids having to use the indexed addressing mode, as X(SP), within the function.





IAR CompilerGCC Compiler


foo:
ADD.W R14, R12 //1 cycle, 1 word
ADD.W 0x2(SP), R12 //3c,2w
ADD.W 0x4(SP), R12 //3c,2w
RET



foo:
ADD.W R14, R15 //1c,1w
ADD.W R13, R15 //1c,1w
ADD.W R12, R15 //1c,1w
RET



Pages 3-72 and 3-73 of the MSP430 Family Guide[3] detail the full cost of these additions, which increase not only the runtime but also the storage requirements of the function. According to those pages, "ADD.W r14,r15" takes 1 cycle and 1 word of memory while "ADD.W 0x2(SP), R12" takes 3 cycles and 2 words of memory. Additionally, each of the two PUSH.W statements required to call foo() in the IAR compiler takes 3 cycles, which are unnecessary in GCC.

Texas Instruments' Code Composer Essentials does not suffer from IAR's inefficiency; rather, it uses an ABI similar to but incompatible with GCC. TICCE allocates register R12 for the first parameter, then R13, R14, and R15. The result is returned in R12. GCC uses registers in the opposite order and returns in R15. See the Users Guide[4] for more details.

What's the reasoning behind IAR's design? It makes functions of two 32-bit values easily compatible with those of two 16-bit values, but this compatibility breaks as soon as the third parameter comes into play, which is pushed onto the stack as a single word. If such compatibility were essential, the trick could be maintained by using R13 for the third parameter and R15 for the fourth.

Sources:
[1] mspgcc manual
[2] IAR manuals
[3] Texas Instruments's MSP430 Family Guide
[4] MSP430 Optimizing C/C++ Compiler User's Guide (SLAU132)

2 comments:

Travis Goodspeed said...

IAR's register usage has been fixed as of October 19th, beginning with IAR - EW430 4.10A. See here.

Jeff said...

Excellent observation. Good job.

Oh geez, I can't post this anonymously.