The ARM Cortex-M3 processor uses the ARMv7-M architecture and Thumb-2 instruction set. The Thumb-2 instruction set is a variable-length instruction set that provides both 16-bit and 32-bit instructions to improve code density while maintaining high performance. The key features of the Thumb-2 instruction set used by Cortex-M3 include:
- 16-bit and 32-bit instruction support – Provides both 16-bit and 32-bit instructions to improve code density. Narrow 16-bit instructions improve density for frequently used instructions like adds and moves. Wide 32-bit instructions provide access to more registers and address space.
- Unified syntax – The same mnemonics are used for both Thumb and ARM instructions, simplifying migration from ARM code.
- Conditional execution – Most instructions can be conditionally executed based on status flags, minimizing branch instructions.
- REGISTERS – 13 general purpose registers R0-R12, and Stack Pointer (SP), Link Register (LR), Program Counter (PC), and Program Status Register (PSR).
- Load/store architecture – Operations occur between registers, not directly on memory. Memory access is done through load/store instructions.
- Byte, halfword, and word load/store options – Flexible data access using 8-bit, 16-bit, or 32-bit memory transactions.
- PC-relative addressing – Ability to directly access data relative to the PC for position-independent code.
- Register-relative addressing – Access local stack frame variables relative to the stack pointer.
- Immediate constants – Support for small constant operands embedded in instructions.
- Load and store multiple – Ability to push/pop registers stacks using one instruction.
- Hardware multiply and divide – Native support for 32×32 multiply and divide instructions.
16-bit and 32-bit Instructions
A key advantage of Thumb-2 is its support for both 16-bit and 32-bit instructions. The 16-bit instructions are designed for compact code size and efficiency for common operations like adds, subtracts, moves, compares, and branches. For example: ADD R1, R2, R3 // 16-bit add instruction CMP R4, #8 // 16-bit compare to immediate BNE label // 16-bit conditional branch
The 32-bit instructions provide additional capabilities like more registers, flexible memory addressing modes, and coprocessor support. For example: MOVW R5, #0x1234 // 32-bit immediate load STMIA R6!, {R1-R5} // 32-bit register push BLX R7 // 32-bit subroutine call
By providing both 16-bit and 32-bit encodings, Thumb-2 offers a good tradeoff between code density and performance. The compiler can choose the best instruction width on an instruction-by-instruction basis.
Unified Syntax
OlderThumb and ARM instruction sets used different mnemonics and syntax, complicating migration of code between the two. Thumb-2 uses a unified syntax, so the same mnemonics are used regardless of the instruction width.
For example, the ADD instruction works identically for both 16-bit and 32-bit forms: ADD R0, R1, R2 // 16-bit add ADD R3, R4, R5 // 32-bit add
This simplifies porting code since the same operations have the same names in Thumb and ARM state. Some features like conditional execution work uniformly regardless of instruction width.
Conditional Execution
Most Thumb-2 instructions can be conditionally executed based on status flags in the CPSR register. This is done by adding a conditional suffix to an instruction like EQ, NE, GT, LT, etc. For example: ADDEQ R1, R2, R3 // Add only if Z flag is set SUBSGT R4, R5, #1 // Subtract and set flags if Z clear and N equal V MOVEQ R0, #1 // Move immediate if Z set
Conditional execution allows code to branch less by predicating instructions on status flags. This improves performance by avoiding the cost of flush and refill when branching.
Registers
The Cortex-M3 supports 13 general purpose registers R0-R12. These can be used as operands for mathematical and logical instructions. R13 is the Stack Pointer (SP) and R14 is the Link Register (LR) for holding subroutine return addresses. R15 is the Program Counter (PC).
The Current Program Status Register (CPSR) holds status flags like negative, zero, carry, and overflow for conditional execution. The xPSR is an extension of the CPSR with additional CPU mode flags.
Load/Store Architecture
Thumb-2 uses a load/store architecture where data processing operations occur between registers, not directly on memory. Any memory access is done through explicit load or store instructions. For example: LDR R1, [R2] // Load from memory into R1 ADD R3, R1, #1 // Operation on registers STR R3, [R2] // Store register back to memory
This simplifies instruction decoding and pipelining since the registers provide the source and destination for operations. Explicit load and store instructions move data between memory and registers.
Flexible Memory Access
The Thumb-2 instruction set provides flexible options for memory access using 8-bit, 16-bit, or 32-bit transactions. This allows efficient access for data structures using different data widths. LDRB R1, [R2] // 8-bit unsigned load LDRH R3, [R4] // 16-bit unsigned load LDR R5, [R6] // 32-bit word load
Stores have similar byte, halfword, and word store instructions. Signed and unsigned loads/stores are available, and arrays can be accessed through register offsets or pre/post-incrementing.
PC-Relative Addressing
Thumb-2 includes PC-relative addressing for position-independent code. This allows directly accessing data at a static offset from the current PC value. For example: LDR R1, [PC, #8] // Load R1 from address PC+8
PC-relative addressing simplifies accessing literal pools and static data structures without tracking exact address values.
Register-Relative Addressing
The Stack Pointer (SP) provides a base register for fast access to variables on the stack. Local variables can be directly accessed using the stack pointer with positive or negative offsets. For example: PUSH {R4-R7} // Adjust SP to new stack frame LDR R1, [SP, #16] // Load local variable at SP+16 STR R2, [SP, #-4] // Store variable at SP-4
Register-relative addressing off the stack pointer avoids extra address calculations to access the stack frame.
Immediate Constants
For small immediate constants, Thumb-2 allows embedding 8-bit or 16-bit constant values directly in instructions. For example: MOV R1, #55 // Load 8-bit immediate value ADD R2, R3, #0x100 // Add 16-bit immediate to register CMP R4, #0xFF // Compare to 8-bit immediate
Immediate constants improve code density by avoiding extra load instructions for small literal values.
Load and Store Multiple
PUSH and POP instructions provide efficient register stack operations. All registers can be pushed to the stack or popped from the stack using one instruction. For example: PUSH {R4-R7} // Push R4-R7 onto stack POP {R4-R7} // Pop R4-R7 from stack
This is faster than doing individual push and pop instructions for each register. Load Multiple and Store Multiple also allow transferring multiple registers to or from memory.
Hardware Multiply and Divide
The Cortex-M3 includes hardware to accelerate 32-bit multiply and divide operations. This allows efficient numerical code without software routines. MUL R1, R2, R3 // R1 = R2 * R3 SDIV R4, R5, R6 // R4 = R5 / R6 (signed) UDIV R7, R8, R9 // R7 = R8 / R9 (unsigned)
The MUL instruction gives the full 64-bit result in R1(high) and R0(low). This speeds up multiplying 32-bit operands.
Summary
In summary, the Thumb-2 instruction set used in Cortex-M3 provides:
- High density 16-bit and high performance 32-bit instructions
- Unified ARM and Thumb syntax for easier migration
- Conditional execution of most instructions
- Efficient access to stack frame using register-relative addressing
- Load/store architecture with byte/halfword/word memory access
- PC-relative addressing for position independence
- Faster multiply and divide using hardware acceleration
These capabilities allow Thumb-2 to achieve good code density while delivering high performance on the Cortex-M3 processor. The variable-length encoding makes it well-suited for embedded microcontrollers needing a compact yet full-featured instruction set architecture.