The Cortex-M processors from ARM only support the Thumb-16 instruction set. By default, the GCC compiler will generate a mix of 32-bit Thumb-2 and 16-bit Thumb instructions when compiling code for Cortex-M. However, it is possible to force GCC to emit only 16-bit Thumb instructions using compiler flags.
Why limit code to Thumb-16 on Cortex-M?
There are a few reasons why you may want to restrict the generated code to only use the 16-bit Thumb instruction set when targeting Cortex-M processors:
- Smaller code size – Forcing Thumb-16 can reduce code size substantially compared to allowing Thumb-2 instructions.
- Avoid unexpected faults – Some Cortex-M models do not support Thumb-2 instructions and will fault if encountered.
- Workaround errata – On some Cortex-M implementations, certain Thumb-2 instructions need to be avoided due to silicon bugs.
- Ease debugging – Single-stepping through 16-bit Thumb code on Cortex-M is simpler than mixed Thumb and Thumb-2 code.
For embedded applications where code size and reliability are critical, using only Thumb-16 instructions can be advantageous despite the loss of some optimization opportunities compared to Thumb-2.
GCC options to force Thumb-16 code generation
GCC provides two main options to restrict code generation to only use 16-bit Thumb instructions when compiling for Cortex-M:
-mthumb -mno-thumb-interwork
Passing -mthumb tells GCC to generate Thumb code instead of ARM code. Adding -mno-thumb-interwork disables GCC’s generation of wrapper code to transition between ARM and Thumb states. This prevents GCC from using any 32-bit instructions. gcc -mthumb -mno-thumb-interwork [other options] files
-mthumb -mthumb-interwork
Using -mthumb along with -mthumb-interwork is an alternative approach to get 16-bit Thumb-only code. The interworking support inserts wrappers around called functions to transition into Thumb state. With interworking disabled globally, GCC emits only Thumb-16 instructions. gcc -mthumb -mthumb-interwork [other options] files
Verifying Thumb-16 code generation
It’s important to verify that GCC obeyed the request to restrict code generation to Thumb-16 after compilation. Here are some ways to confirm only 16-bit Thumb instructions were emitted:
- Check assembly listing – Look for absence of 32-bit Thumb-2 instructions.
- Disassemble object file – Use objdump or similar to examine instruction encoding.
- Toggle LED on run – If code runs on Cortex-M, Thumb-2 usage will fault.
- Examine map file – Code sections should take up half the space compared to Thumb-2.
Failing to verify proper Thumb-16 code generation could result in unexpected crashes or faults at runtime on Cortex-M devices.
GCC optimizations with Thumb-16
Limiting GCC to Thumb-16 often increases code size and reduces performance versus allowing Thumb-2 instructions. However, GCC still applies some optimizations when compiling for Thumb-16:
- Constant propagation – Replace variables with constant values when known.
- Common subexpression elimination – Cache duplicate calculations in registers.
- Code hoisting – Move loop invariants outside of loops.
- Branch optimizations – Convert branches to less costly instructions.
- Peephole optimizations – Improve instruction scheduling and pairing.
Higher levels of optimization like -O2 or -O3 can achieve additional gains but increase compile time. Benchmarking is needed to determine if the extra optimization provides worthwhile gains.
When to avoid forcing Thumb-16 code
While limiting compilation to Thumb-16 can be beneficial for Cortex-M, there are also cases where it may not be ideal:
- Code size not critical – Thumb-2 provides better performance without major size impact.
- Speed is very important – Thumb-2 has better optimizations than Thumb-16.
- Cortex-M4/M7 models – These support Thumb-2 so no need to restrict instructions.
- Lots of floating point code – Thumb-2 has better FP support.
- Just prototyping – Extra optimization effort not warranted.
In performance sensitive applications where code size is not a major constraint, allowing Thumb-2 instructions can provide a noticeable speed boost.
Thumb-16 instruction issues to watch out for
When limiting compilation to Thumb-16 instructions, there are some code patterns and GCC behavior to keep in mind:
- Larger switch statements – Table-based switch jumps use Thumb-2 instructions.
- Structure returns – Functions that return structs will use Thumb-2.
- Larger loops – Loop setup may require Thumb-2 instructions.
- Global register variables – Can produce interworking Thumb-2 code.
- Tail call optimization – This Thumb-2 feature will be disabled.
Identifying and working around these issues ensures the resulting code contains only valid Thumb-16 instructions for Cortex-M.
Special considerations for C++ code
C++ code often requires some extra effort to compile into Thumb-16 instructions compared to plain C code.
- Use -fno-rtti and -fno-exceptions to disable C++ RTTI and exceptions.
- Avoid multiple inheritance and virtual methods.
- Be careful with templates, inline functions, and runtime type identification.
- Extensive use of C++ exceptions/RTTI may require Thumb-2.
With care taken to avoid C++ features that emit Thumb-2 instructions, it is possible to generate relatively efficient Thumb-16 code from C++ for Cortex-M.
Conclusion
Forcing GCC to emit only 16-bit Thumb instructions requires using -mthumb along with either -mno-thumb-interwork or -mthumb-interwork. This approach can produce smaller and more reliable code for Cortex-M at the cost of reduced optimization opportunities compared to Thumb-2. With appropriate benchmarking and testing, Thumb-16 code can deliver an optimal blend of size and performance for resource constrained Cortex-M applications.