Data processing instructions in arm cortex m3

The ARM Cortex-M3 is a 32-bit processor core licensed by Arm Holdings. It is part of the Cortex-M series of microcontroller cores, and is designed for embedded applications requiring a low power consumption CPU with good performance. The Cortex-M3 CPU has a 3-stage pipeline and includes features like Thumb-2 instruction set, Nested Vectored Interrupt Controller, optional Memory Protection Unit, and optional Single Instruction Multiple Data (SIMD) instruction support.

Contents

Data Processing Instructions Arithmetic Instructions Logical Instructions Shift and Rotate Instructions Move Instructions Compare Instructions Bit Field Instructions Addressing Modes Register Mode Immediate Mode Label Mode Scaled Register Mode Condition Flags Pipelines and Performance Instruction Set Encoding Instruction Set Optimization Instruction Latency and Throughput Instruction Set Summary Conclusion

The Cortex-M3 implements the ARMv7-M architecture which includes the Thumb-2 instruction set. Thumb-2 extends the previous Thumb (Thumb-1) instruction set with additional 32-bit instructions while retaining all the existing 16-bit Thumb-1 instructions. This allows Thumb-2 code to achieve similar performance as ARM code while having higher code density. The Thumb-2 instruction set includes both 16-bit and 32-bit instructions which can be freely intermixed in Thumb-2 code.

Data Processing Instructions

The ARM Cortex-M3 and Thumb-2 instruction set provides various data processing instructions to operate on registers and constants. These include:

Arithmetic instructions like ADD, SUB, MUL, etc
Logical instructions like AND, ORR, EOR, etc
Shift and rotate instructions like LSL, LSR, ASR, ROR, etc

Move instructions like MOV, MVN
Compare instructions like CMP, CMN, TST
Bit field instructions like BFI, BFC

These data processing instructions allow efficient manipulation of data stored in registers. The instructions can take register operands, constant immediate operands or both. Next we’ll look at some examples of using these data processing instructions.

Arithmetic Instructions

Arithmetic instructions like ADD, SUB perform addition and subtraction on register operands. For example: ADD R1, R2, R3 //R1 = R2 + R3 SUB R5, R3, #10 //R5 = R3 – 10

Signed integer multiplication can be performed using MUL instruction: MUL R2, R1, R3 //R2 = R1 * R3

The SMUL and SMLA instructions allow signed integer multiply accumulate operations: SMUL R5, R2, R9 //R5 = R2 * R9 SMLA R4, R1, R3, R4 //R4 = R4 + R1 * R3

Logical Instructions

Logical AND, OR, XOR operations on registers can be performed using AND, ORR, EOR instructions. For example: AND R2, R5, #0xF //R2 = R5 AND 0xF ORR R4, R1, R8 //R4 = R1 OR R8 EOR R7, R3, R9 //R7 = R3 XOR R9

The ANDS, ORRS, EORS instructions update status flags based on operation result.

Bitwise NOT can be performed using MVN (Move NOT) instruction: MVN R6, R8 //R6 = NOT R8

Shift and Rotate Instructions

Barrel shifter and bit rotation operations on registers can be done using the shift/rotate instructions.

Logical Shift Left (LSL), Logical Shift Right (LSR) perform bit shift operations. Arithmetic Shift Right (ASR) performs shift considering sign bit. LSL R5, R3, #2 //R5 = R3 << 2 LSR R2, R7, #5 //R2 = R7 >> 5 ASR R4, R1, #3 //Arithmetic shift right R1 by 3

Rotate Right (ROR) and Rotate Right Extended (RRX) perform bit rotation by specified amount or by carry flag respectively. ROR R8, R2, #8 //Rotate R2 right by 8 bits RRX R9, R1 //Rotate R1 right by carry flag

Move Instructions

The MOV instruction copies a value from one register to another register. For example: MOV R4, R8 //R4 = R8

It can also move an immediate constant value into a register. MOV R2, #0x55 //R2 = 0x55

The MVN instruction performs bitwise NOT operation during the move. For example: MVN R6, R8 //R6 = NOT R8

Compare Instructions

Compare instructions like CMP, CMN, TST are used to compare two operands. They update the status flags but don’t store result in a register.

CMP performs subtraction, CMN performs addition and TST performs logical AND operation for comparing: CMP R1, #5 //Compare R1 – 5 CMN R3, R7 //Compare R3 + R7 TST R2, R4 //Compare R2 AND R4

These instructions help to perform conditional testing and branching.

Bit Field Instructions

Bit Field instructions allow access to and manipulation of a specific bit-field within a register. For example: BFI R5, R8, #3, #5 //Insert 5 bits from R8 into R5 from bit 3 BFC R4, #7, #3 //Clear 3 bits in R4 from bit 7

This allows bit masks and bit flags to be created and maintained in registers for efficient bit manipulation.

Addressing Modes

The ARM Cortex-M3 data processing instructions support several addressing modes for specifying the operands. This includes using registers, constants and labels for the instructions.

Register Mode

In register addressing mode, a register is specified as an operand. This allows instructions operations between CPU registers. ADD R1, R2, R3 //Register operands CMP R4, R8 //Register operands

Immediate Mode

In immediate addressing mode, a constant value is specified as an operand. This is useful for simple constant operations. ADD R1, R2, #10 //Immediate constant operand CMP R4, #0xF //Compare with immediate value

Label Mode

PC relative addressing using labels can be used for jump and branch instructions. The label refers to a memory address location. BNE loop //Branch to label ‘loop’ CBZ R1, begin //Branch if R1 is 0

Scaled Register Mode

Certain ARM instructions like LDR, STR allow scaled register addressing mode. The offset register is shifted left by the scale amount before being added. LDR R5, [R2, R1, LSL #3] //R1 is scaled by 3 before offset STR R8, [R4, R6, LSL #2] //R6 is scaled by 2 before offset

This helps index arrays and structured data by eliminating extra shift instructions.

Condition Flags

The ALU instructions update the 4 condition flags in the Application Program Status Register (APSR) based on the result:

N – Negative flag
Z – Zero flag

C – Carry flag
V – Overflow flag

These flags can be tested using conditional execution instructions like BNE, BEQ, BMI etc. Some examples: CMP R1, R2 //Compare R1 – R2 BGT label //Branch if R1 > R2 (tests N,V,C flags) SUB R3, R4 //R3 = R4 – R3 BLT label //Branch if R3 < 0 (tests N flag)

This allows code execution to be conditional based on results of previous arithmetic or logical instructions.

Pipelines and Performance

The Cortex-M3 uses a 3 stage pipeline – Fetch, Decode and Execute. This enables some basic parallelism and increases performance compared to sequential non-pipelined execution.

While one instruction executes (Execute stage), the next instruction can be decoded (Decode stage) and another fetched (Fetch stage). If instructions are independent, they can execute in parallel through the pipeline improving performance.

The branch predictor reduces pipeline stalls by guessing the target of branches. The Memory Protection Unit (MPU) can improve performance by allowing faster memory accesses to protected regions.

Overall the Cortex-M3 pipeline along with the Thumb-2 instruction set provides good performance for embedded applications while minimizing energy consumption.

Instruction Set Encoding

The Thumb-2 instruction set uses both 16-bit and 32-bit instruction encodings. A subset of the instructions are available in both 16-bit and 32-bit formats. 16-bit: MOVS R5, #100 //Move immediate value to R5 CMP R1, R2 //Compare R1 and R2 32-bit: MOVWS R8, #1000 //Move wider immediate value SUBS R3, R4, R5 //Subtract with status flag update

The 16-bit format provides higher code density while the 32-bit format allows larger immediate constants and more functionality like updating status flags.

Some instruction classes like branch and load/store instructions are available only in 16-bit format. While some complex instructions like multiply are only available in 32-bit format.

The unified 16-bit and 32-bit encoding allows Thumb-2 to achieve good performance and code density – making it very suitable for embedded applications.

Instruction Set Optimization

Here are some tips for optimizing code to make best use of the Cortex-M3/Thumb-2 instruction set architecture:

Use 16-bit instructions whenever possible for better code density
Minimize branching to avoid pipeline stalls

Use conditional execution instead of branches if possible
Combine addition/subtraction with status flag update to eliminate CMP
Use scaled register offset addressing mode to avoid extra shifts

Utilize SIMD instructions to perform parallel arithmetic where possible
Take advantage of constant pools to avoid large MOV instructions
Optimize shift operations using MOV + LSL/LSR instead of LSL/LSR

Proper register allocation, efficient bit manipulation and taking advantage of pipelines/caches also helps improve performance. Compilers will handle many optimizations automatically nowadays.

Instruction Latency and Throughput

Instructions have different latencies and throughputs depending on their type and pipeline implementation.

Latency determines the number of cycles needed to get the result of an instruction. Simple ALU instructions have just 1 cycle latency.

Throughput determines how many instructions can execute per cycle. Pipelined execution and parallel execution units allow higher throughput.

On the Cortex-M3, most ALU ops take just 1 cycle for both latency and throughput. This includes:

Additions – ADD, SUB, ADC, SBC etc.

Logical – AND, ORR, EOR, BIC
Move – MOV, MVN
Compare – CMP, CMN, TST

Shift/Rotate – LSL, LSR, ASR, ROR

Multiplies take more cycles with MUL at 1 cycle latency but 1/32-bit throughput. Multiply-Accumulates like MLA take 2 cycle latency and 1/2-bit throughput.

Load-Store instructions have 2 cycle latency and 1 cycle throughput. Branch instructions have 1 cycle latency while throughput depends on branch prediction.

Knowing the instruction timings allows proper scheduling to avoid potential pipeline stalls. This helps achieve maximum performance.

Instruction Set Summary

To summarize, the Thumb-2 instruction set provides a versatile set of data processing, memory access and flow control instructions for the Cortex-M3 CPU.

Key highlights:

High performance 32-bit instructions mixed with compact 16-bit instructions
Flexible arithmetic, logical, comparison, bit-field, and move instructions
Efficient shift and rotate instructions

Load/store instructions with scaled offset addressing
Conditional execution for branches
Constant pools and PC relative branch addressing

Pipelined implementation for instruction parallelism

Overall, Thumb-2 provides an excellent instruction set architecture that balances performance, code density and power efficiency for embedded system development using the Cortex-M3 processor.

Conclusion

In this article, we looked at how data processing instructions work on the ARM Cortex-M3 CPU. We covered the various arithmetic, logical, move, compare, shift/rotate instructions and their addressing modes. We also saw how the condition flags help implement conditional execution after an ALU operation. Techniques for optimizing instructions were discussed along with details about pipelines and instruction timings.

The Thumb-2 instruction set with its mix of 16-bit and 32-bit instructions provides a great combination of high performance, good code density and low power consumption. Developers can leverage the capabilities of the instruction set architecture effectively to build efficient embedded applications using the Cortex-M3 processor.