The Cortex-M0 is a 32-bit ARM processor core designed for microcontroller applications. It is one of the smallest and simplest cores in the Cortex-M series, optimized for low-cost and low-power embedded systems.
The Cortex-M0 uses the Thumb-2 instruction set, which is a mix of 16-bit and 32-bit instructions. The 16-bit Thumb instructions provide high code density while the 32-bit instructions allow access to the full 32-bit register set. All Cortex-M cores, including the M0, only execute Thumb instructions, not legacy ARM instructions.
So in summary, the Cortex-M0 supports:
- 16-bit Thumb instructions
- 32-bit Thumb instructions
This gives it a variable instruction length architecture. The size of each instruction depends on whether it is a 16-bit Thumb or 32-bit Thumb encoding.
16-bit Thumb Instructions
The 16-bit Thumb instruction encoding is intended for common instructions that can be efficiently encoded in a single halfword. Here are some examples of 16-bit Thumb instructions:
- ADD Rd, Rs – Add two registers
- CMP Rn, #imm – Compare register to immediate
- LDR Rd, [Rn, #imm] – Load register from memory
- STR Rd, [Rn, #imm] – Store register to memory
- B label – Unconditional branch
In the Cortex-M0, a majority of the instructions used in typical programs will be 16-bit Thumb instructions. These provide high code density and efficient use of the limited code memory found on microcontrollers.
32-bit Thumb Instructions
The 32-bit Thumb instruction encoding is available for instructions that need more operand space or fields than can be provided in 16 bits. Here are some examples:
- ADDS Rd, Rn, #imm – Add immediate with carry flag
- MOVW Rd, #imm – Move 16-bit immediate to register
- CBNZ Rn, label – Compare and branch if not zero
- PUSH {R4-R7} – Push multiple registers to stack
- BL function – Branch with link to subroutine
The 32-bit encodings allow access to more registers, larger immediate values, and more addressing modes. They provide compatibility with the older ARM instruction set architecture.
Instruction Mix
Overall, Arm estimates that Thumb-2 code density for the Cortex-M0 is approximately 65% to 70% of ARM code density. This is a significant improvement over the 50% to 60% density of earlier Thumb-only cores.
Within a program, the typical instruction mix will depend on the application. Integer math and logic operations tend to use more 16-bit instructions. Complex operations like function calls, branches, loads and stores use more 32-bit instructions.
As a general guideline, you can expect around 70% to 80% of the instructions to be 16-bit, and 20% to 30% to be 32-bit in a well optimized Cortex-M0 program. This can vary quite a bit depending on coding style and compiler optimizations.
Size Implications
The variable instruction length encoding means that Cortex-M0 code size depends on the instruction mix. As an example, consider a function with these characteristics:
- 80 instructions total
- 60 16-bit instructions = 60 bytes
- 20 32-bit instructions = 80 bytes
The total size of those 80 instructions would be:
- 60 x 2 bytes = 120 bytes (for 16-bit instructions)
- 20 x 4 bytes = 80 bytes (for 32-bit instructions)
- Total = 120 + 80 = 200 bytes
In other words, those 80 instructions would occupy 200 bytes of code memory. This is an average of 2.5 bytes per instruction.
In contrast, a fixed 32-bit encoding like ARM Thumb-1 would require 4 bytes per instruction. So those same 80 instructions would take 320 bytes (80 x 4 bytes).
This demonstrates how the variable Thumb-2 encoding provides much better code density than a fixed-length 32-bit encoding, while still retaining the benefits of 32-bit instructions.
Code Density Optimization
There are several ways to optimize code density for Cortex-M0 programs:
- Use 16-bit Thumb instructions where possible
- Avoid unnecessary function calls and stack usage
- Use conditional execution instead of branching
- Inline small functions
- Use indexed addressing modes instead of stack allocated variables
- Use compiler optimizations focused on size reduction
With careful coding and compilation, it is possible to achieve 80% or higher 16-bit instruction density for Cortex-M0 programs. This results in very compact code that maximizes utilization of the limited program memory.
Performance Impact
The variable instruction length does not impact performance very much. Some key points:
- 16-bit and 32-bit instructions generally execute in the same number of cycles
- The Cortex-M0 pipeline fetches both 16-bit and 32-bit instructions equally efficiently
- Branch instructions take the same number of cycles whether 16-bit or 32-bit
- The mix of instructions has little effect on CPI (cycles per instruction)
So maximizing 16-bit instructions to optimize code density does not significantly affect performance. The Cortex-M0 achieves good performance alongside excellent code density.
Conclusion
In summary, the Cortex-M0 uses a variable length Thumb-2 instruction set architecture. Most instructions are 16-bit for compact code, with 32-bit instructions available when needed. The typical instruction density is around 70% to 80% 16-bit instructions and 20% to 30% 32-bit instructions. This provides excellent code density without compromising performance.