Forcing GCC to generate only Thumb-16 instructions for Cortex-M

The Cortex-M processors from ARM only support the Thumb-16 instruction set. By default, the GCC compiler will generate a mix of 32-bit Thumb-2 and 16-bit Thumb instructions when compiling code for Cortex-M. However, it is possible to force GCC to emit only 16-bit Thumb instructions using compiler flags.

Contents

Why limit code to Thumb-16 on Cortex-M?GCC options to force Thumb-16 code generation -mthumb -mno-thumb-interwork -mthumb -mthumb-interwork Verifying Thumb-16 code generation GCC optimizations with Thumb-16 When to avoid forcing Thumb-16 code Thumb-16 instruction issues to watch out for Special considerations for C++ code Conclusion

Why limit code to Thumb-16 on Cortex-M?

There are a few reasons why you may want to restrict the generated code to only use the 16-bit Thumb instruction set when targeting Cortex-M processors:

Smaller code size – Forcing Thumb-16 can reduce code size substantially compared to allowing Thumb-2 instructions.

Avoid unexpected faults – Some Cortex-M models do not support Thumb-2 instructions and will fault if encountered.
Workaround errata – On some Cortex-M implementations, certain Thumb-2 instructions need to be avoided due to silicon bugs.
Ease debugging – Single-stepping through 16-bit Thumb code on Cortex-M is simpler than mixed Thumb and Thumb-2 code.

For embedded applications where code size and reliability are critical, using only Thumb-16 instructions can be advantageous despite the loss of some optimization opportunities compared to Thumb-2.

GCC options to force Thumb-16 code generation

GCC provides two main options to restrict code generation to only use 16-bit Thumb instructions when compiling for Cortex-M:

-mthumb -mno-thumb-interwork

Passing -mthumb tells GCC to generate Thumb code instead of ARM code. Adding -mno-thumb-interwork disables GCC’s generation of wrapper code to transition between ARM and Thumb states. This prevents GCC from using any 32-bit instructions. gcc -mthumb -mno-thumb-interwork [other options] files

-mthumb -mthumb-interwork

Using -mthumb along with -mthumb-interwork is an alternative approach to get 16-bit Thumb-only code. The interworking support inserts wrappers around called functions to transition into Thumb state. With interworking disabled globally, GCC emits only Thumb-16 instructions. gcc -mthumb -mthumb-interwork [other options] files

Verifying Thumb-16 code generation

It’s important to verify that GCC obeyed the request to restrict code generation to Thumb-16 after compilation. Here are some ways to confirm only 16-bit Thumb instructions were emitted:

Check assembly listing – Look for absence of 32-bit Thumb-2 instructions.

Disassemble object file – Use objdump or similar to examine instruction encoding.
Toggle LED on run – If code runs on Cortex-M, Thumb-2 usage will fault.
Examine map file – Code sections should take up half the space compared to Thumb-2.

Failing to verify proper Thumb-16 code generation could result in unexpected crashes or faults at runtime on Cortex-M devices.

GCC optimizations with Thumb-16

Limiting GCC to Thumb-16 often increases code size and reduces performance versus allowing Thumb-2 instructions. However, GCC still applies some optimizations when compiling for Thumb-16:

Constant propagation – Replace variables with constant values when known.

Common subexpression elimination – Cache duplicate calculations in registers.
Code hoisting – Move loop invariants outside of loops.
Branch optimizations – Convert branches to less costly instructions.

Peephole optimizations – Improve instruction scheduling and pairing.

Higher levels of optimization like -O2 or -O3 can achieve additional gains but increase compile time. Benchmarking is needed to determine if the extra optimization provides worthwhile gains.

When to avoid forcing Thumb-16 code

While limiting compilation to Thumb-16 can be beneficial for Cortex-M, there are also cases where it may not be ideal:

Code size not critical – Thumb-2 provides better performance without major size impact.
Speed is very important – Thumb-2 has better optimizations than Thumb-16.
Cortex-M4/M7 models – These support Thumb-2 so no need to restrict instructions.

Lots of floating point code – Thumb-2 has better FP support.
Just prototyping – Extra optimization effort not warranted.

In performance sensitive applications where code size is not a major constraint, allowing Thumb-2 instructions can provide a noticeable speed boost.

Thumb-16 instruction issues to watch out for

When limiting compilation to Thumb-16 instructions, there are some code patterns and GCC behavior to keep in mind:

Larger switch statements – Table-based switch jumps use Thumb-2 instructions.
Structure returns – Functions that return structs will use Thumb-2.

Larger loops – Loop setup may require Thumb-2 instructions.
Global register variables – Can produce interworking Thumb-2 code.
Tail call optimization – This Thumb-2 feature will be disabled.

Identifying and working around these issues ensures the resulting code contains only valid Thumb-16 instructions for Cortex-M.

Special considerations for C++ code

C++ code often requires some extra effort to compile into Thumb-16 instructions compared to plain C code.

Use -fno-rtti and -fno-exceptions to disable C++ RTTI and exceptions.

Avoid multiple inheritance and virtual methods.
Be careful with templates, inline functions, and runtime type identification.
Extensive use of C++ exceptions/RTTI may require Thumb-2.

With care taken to avoid C++ features that emit Thumb-2 instructions, it is possible to generate relatively efficient Thumb-16 code from C++ for Cortex-M.

Conclusion

Forcing GCC to emit only 16-bit Thumb instructions requires using -mthumb along with either -mno-thumb-interwork or -mthumb-interwork. This approach can produce smaller and more reliable code for Cortex-M at the cost of reduced optimization opportunities compared to Thumb-2. With appropriate benchmarking and testing, Thumb-16 code can deliver an optimal blend of size and performance for resource constrained Cortex-M applications.

Forcing GCC to generate only Thumb-16 instructions for Cortex-M

Why limit code to Thumb-16 on Cortex-M?

GCC options to force Thumb-16 code generation

-mthumb -mno-thumb-interwork

-mthumb -mthumb-interwork

Verifying Thumb-16 code generation

GCC optimizations with Thumb-16

When to avoid forcing Thumb-16 code

Thumb-16 instruction issues to watch out for

Special considerations for C++ code

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

How to Run a Cycle Mode (DSM=yes) for CORTEX-M0 Processor?

What is the difference between Cortex-A76 and A77?

Will Arm Replace X64?

Debugging Cortex-M1 on Arty without adaptor or DAPLink