Cortex-M processors use the Thumb instruction set, which is a compact variant of the ARM instruction set that provides improved code density and efficiency for embedded applications. The Thumb instruction set was specifically designed by ARM to meet the needs of deeply embedded applications where code size and power consumption are critical constraints.
Overview of ARM Instruction Sets
ARM processors support multiple instruction sets including:
- ARM – The original 32-bit ARM instruction set with fixed 32-bit instruction lengths
- Thumb – A compact 16-bit and 32-bit variable length instruction set derived from ARM
- Thumb-2 – An extension of Thumb adding some 16-bit and 32-bit instructions
- Jazelle – Extension supporting direct execution of Java bytecodes
- DSP – Digital Signal Processing extension to ARM and Thumb
- SIMD – Single Instruction Multiple Data extension to ARM and Thumb
- TrustZone – Security extension to ARM and Thumb
The Cortex-M series of microcontrollers exclusively use the Thumb and Thumb-2 instruction sets. The Thumb instruction set is ideal for Cortex-M devices because:
- It provides improved code density compared to 32-bit ARM instructions
- The variable length encoding provides a good balance of size and performance
- The dual 16-bit and 32-bit instruction sizes fit well with the Cortex-M’s embedded memory architecture
- It maintains full interoperability with legacy 32-bit ARM code
Key Advantages of Using Thumb
Here are some of the key advantages the Thumb instruction set provides for Cortex-M processors:
Code Density
The Thumb instruction set can provide up to 65% better code density compared to 32-bit ARM instructions. This is critical for reducing flash memory requirements in embedded systems. Thumb achieves this through:
- Using 16-bit encoding for common instructions
- Variable length instruction encoding
- Efficient instruction mix optimized for C compiler output
The compact 16-bit Thumb instructions allow fitting more instructions and data into limited embedded flash memory. Thumb-2 extends this further by allowing both 16-bit and 32-bit instructions to be efficiently intermixed in a single program.
Performance Efficiency
Although Thumb has smaller instruction sizes, ARM has optimized the instruction sets so there is minimal performance overhead for Thumb code. Key techniques include:
- Optimized mixing of 16-bit and 32-bit instructions
- Using 32-bit instructions for complex operations
- Similar execution rates for corresponding ARM and Thumb instructions
- Hardware interworking support to switch efficiently between ARM and Thumb modes
With these optimizations, typical performance overhead is only around 10% compared to ARM code. The improved code density more than makes up for this minor loss in performance.
Power Efficiency
The higher code density of Thumb code directly translates to lower power consumption for Cortex-M processors. With less code, the processor needs to fetch and decode fewer instructions from flash memory leading to:
- Reduced instruction cache and memory controller power
- Lower instruction fetch bandwidth requirements
- Less power consumed decoding instructions
Optimizing embedded code size using Thumb enables more energy efficient Cortex-M applications.
Interoperability With ARM Code
A key benefit of Thumb is it maintains almost full interoperability with existing ARM assembly and machine code. This was critical for allowing reuse of vast amounts of legacy ARM code when Thumb was introduced.
Cortex-M processors include hardware interworking support allowing seamless switching between ARM and Thumb modes. This enables reuse of ARM libraries and routines while still benefiting from Thumb’s code density for main application code.
Cortex-M Processor Implementations
All Cortex-M processor families and variants utilize the Thumb and Thumb-2 instruction sets. This includes:
- Cortex-M0 – Entry level MCU for ultra low cost applications
- Cortex-M0+ – Enhanced M0 with higher performance and features
- Cortex-M1 – Intended for custom ASIC integration
- Cortex-M3 – Mainstream MCU with DSP extensions
- Cortex-M4 – High performance MCU with floating point unit
- Cortex-M7 – Highest performance MCU for advanced applications
- Cortex-M23 – Secure IoT processor with TrustZone
- Cortex-M33 – Higher performance MCU with TrustZone
Within each family, there are often further variants optimized for different applications or manufacturing processes like ultra low power. But all leverage Thumb and Thumb-2 to meet the code density, power and performance needs of deeply embedded real-time applications.
Advanced extensions like DSP, SIMD, floating point, and TrustZone augment the baseline Thumb-2 instruction set to provide additional capabilities targeted for specific use cases. But Thumb remains the foundational instruction set used across all Cortex-M processors and applications.
Compiler Support for Thumb and Thumb-2
All major embedded compilers provide strong support for generating efficient Thumb and Thumb-2 code for Cortex-M processors. This includes:
- GCC – The GNU Compiler Collection with ARM and Thumb support
- ARMCC – ARM’s ownANSI C and C++ compiler for their architectures
- IAR – IAR Systems popular compiler for ARM with optimizations
- Keil MDK – Arm’s μVision IDE with highly regarded compiler
The compilers perform Thumb/Thumb-2 code generation from C/C++ source code automatically based on the target architecture. Common optimization switches include:
- -mthumb – Emit Thumb vs ARM instructions
- -mcpu= – Target a specific Thumb CPU variant
- -mfloat-abi= – Control floating point ABI conventions
For assembly code, the .thumb and .thumb_func directives indicate Thumb vs ARM mode functions. Overall, Thumb support is seamless with minimal input needed from the programmer.
Summary
The Thumb instruction set, along with its Thumb-2 enhancements, underpins all Cortex-M series processors. Its variable length encoding provides an ideal balance of code density, performance, power efficiency and interoperability. Thumb’s compact 16-bit and 32-bit instructions are a key reason Cortex-M has become the most popular processor architecture for deeply embedded applications needing to balance cost, power and real-time performance.