The Cortex-M3 processor supports both Thumb and ARM instruction sets. Thumb instructions are 16-bit long while ARM instructions are 32-bit long. Using Thumb instructions reduces code size, but ARM instructions can provide better performance. This article provides a detailed comparison between Thumb and ARM instructions when programming the Cortex-M3 processor.
Overview of Thumb and ARM Instruction Sets
The Thumb instruction set was introduced in ARMv4T architecture as a space-efficient alternative to the 32-bit ARM instruction set. Thumb instructions are 16-bit long instead of 32-bit, which allows Thumb code to be denser than ARM code. This reduces overall code size and memory footprint. The Cortex-M3 implements the Thumb-2 instruction set which is an enhancement over the original Thumb ISA. It adds many 16-bit instructions previously only available in the 32-bit ARM ISA while retaining backward compatibility with Thumb-1. This helps achieve better code density compared to ARM without significantly compromising performance.
The ARM instruction set provides 32-bit instructions with greater encoding freedom compared to Thumb. This allows ARM instructions to encode more complex operations in a single instruction. ARM code can potentially execute faster because more work is done per instruction. However, ARM instructions occupy twice as much space as Thumb. This causes an increase in overall code size.
The Cortex-M3 supports intermixing Thumb and ARM assembly within the same program. Thumb code can branch to ARM code and vice versa. This provides flexibility to use a Thumb core while still taking advantage of ARM instructions for performance-critical sections.
Code Density
Code density is an important metric for embedded systems. Denser code reduces demands on memory size and bus bandwidth. Thumb provides significantly better code density than ARM:
- Thumb-2 instructions are half the size of ARM instructions (16-bit vs 32-bit).
- Most common operations can be encoded in 16-bit in Thumb-2. Complex operations require multiple Thumb instructions.
- Thumb code requires fewer bits to encode addresses because of the smaller instruction size.
- Thumb function calls use 16-bit BL instructions instead of 32-bit ARM BLX instructions.
Real-world measurements show Thumb code size is approximately 65% to 70% of ARM code size. So Thumb provides 30% to 35% better code density compared to ARM on Cortex-M3.
Performance and Execution Time
ARM instructions can potentially execute faster than Thumb because ARM encodes more complex operations in single instructions. However, Cortex-M3 implements several micro-architectural features to reduce the performance gap between Thumb and ARM:
- Thumb instructions are directly decoded into micro-ops like ARM instructions. There is no intermediate translation step.
- Efficient branching avoids pipeline flushes when switching between ARM and Thumb states.
- Thumb-2 provides many 16-bit encodings for complex instructions previously only available in ARM, such as shifts, bitfield operations etc.
Real-world measurements on Cortex-M3 show the performance difference is approximately:
- Integer performance is similar for both Thumb and ARM code.
- Thumb floating point executes about 1.1 to 1.3x slower than ARM floating point.
- Thumb memory access is 1.1 to 1.2x slower than ARM memory access.
So while ARM instructions are faster, the actual performance difference is moderate at 10% to 30% depending on the type of operations. The choice between Thumb and ARM involves tradeoffs between code size and performance.
Intermixing Thumb and ARM Code
The Cortex-M3 supports freely intermixing Thumb and ARM assembly within the same program. Branches between Thumb and ARM states are handled efficiently by the processor. This allows developers to write non-critical program code in Thumb for reduced size, while using ARM for performance-sensitive functions and algorithms.
Some ways to leverage intermixing include:
- Use Thumb for main program control flow, function calls and task management.
- Implement core algorithms and math functions in ARM for better speed.
- Use ARM for functions that process large amount of data.
- Write interrupt handlers in ARM to reduce latency.
Proper Thumb and ARM mixing requires following best practices:
- Minimize switching between states. Group Thumb and ARM code into larger blocks.
- Use ARM wrappers for performance-critical Thumb functions.
- Consider alignment and branches across 1 KB boundaries.
- Optimize switching method based on frequency.
Accessing Data and Memory
Thumb and ARM code on Cortex-M3 share the same uniform, 32-bit address space for data access and memory operations. A few key points:
- Data pointers and addresses use 32-bit values for both Thumb and ARM.
- Stack pointers like SP and LR are identical in Thumb and ARM states.
- Access to registers, globals and heap is uniform irrespective of instruction set.
- Load/store instructions use same addressing modes in Thumb and ARM.
This makes it straightforward to share data between Thumb and ARM code. The processor automatically handles transitioning to correct register and address size for each instruction set state.
Function Calls and Branches
The Cortex-M3 uses BL and BLX instructions for function calls in Thumb and ARM states respectively. BL branches are optimized for lower overhead:
- BL has 16-bit encoding compared to 32-bit for BLX.
- No pipeline flush on BL, while BLX may flush pipeline.
- BL preserves current Thumb state while BLX changes to ARM state.
- Overhead of BL is 5 cycles vs 7 cycles for BLX in Cortex-M3.
However, switching between Thumb and ARM states using function calls has small overhead. Some techniques to optimize switching overhead are:
- Use wrapping functions to avoid switching for core Thumb functions.
- Utilize inlining to reduce function call overhead.
- Place switching code at optimal 1 KB boundaries.
Exceptions and Interrupts
The Cortex-M3 supports handling exceptions and interrupts from both Thumb and ARM states uniformly. Key aspects are:
- Return stacks push appropriate LR and LR_abt registers.
- Stack frame adjusts to correct instruction set state.
- IRQ and fault handlers can be written in either Thumb or ARM.
- SVCall exception handling is identical between the two states.
This enables mixing Thumb and ARM exception handlers seamlessly. ARM exception handlers can shorten interrupt latency. While most handlers can use Thumb code for reduced size.
Debugging Thumb and ARM Code
Debugging Cortex-M3 code containing both Thumb and ARM requires debuggers that support both states. Key debugging considerations are:
- Ensure debugger handles register, stack and memory correctly per ISA state.
- Debugger must track transition between Thumb and ARM code.
- Support for setting breakpoint on Thumb-ARM boundaries.
- Single stepping must account for state transition.
ARM Development Studio, Keil MDK and OpenOCD debugger support joint Thumb-2 and ARM debugging. Debug configuration needs to enable intermixing and dual-state debugging support.
Tools and Compiler Support
Most ARM compilers support generating both Thumb and ARM code for Cortex-M3:
- GCC – Supports intermixing via inline assembly and intrinsics.
- ARM Compiler – Automatically intermixes based on code efficiency.
- IAR – Allows mixing Thumb with ARM functions.
- Keil MDK – Intermixing done manually or automatically.
Compiler optimizations like inlining and loop unrolling also help reduce the Thumb performance gap by generating more efficient code. Enabling these tends to improve Thumb code significantly.
Summary
For Cortex-M3, Thumb code provides better code density while ARM can provide higher performance. Intermixing Thumb and ARM allows balancing these factors effectively for a given application based on its requirements, algorithms and performance metrics. ARM instructions are recommended for code segments involving extensive processing or data access. Thumb works best for control functions, I/O routines and exception handlers. Utilizing compilers and tools that support intermixing makes the dual instruction sets easy to use together in Cortex-M3.