The ARM Cortex M0 is a 32-bit processor core designed for microcontroller applications. It is one of the smallest and lowest power ARM processor cores available, making it well-suited for IoT and wearable devices. Understanding the instruction execution times for the Cortex M0 is important for optimizing performance in time-critical applications.
In general, most instructions on the Cortex M0 take just a single clock cycle to execute. This includes simple arithmetic, logical, and data transfer instructions. However, some instructions like multiplies, divides, and loads/stores can take multiple cycles.
Load and store instructions access memory and vary in execution time depending on the addressing mode:
- Register offset addressing – 1 cycle
- Immediate offset addressing – 2 cycles
- Absolute addressing – 3 cycles
For example, LDR R1, [R2, #8] takes 1 cycle to load from the address in R2 offset by 8 bytes. Whereas LDR R1, [R2, #256] takes 2 cycles due to the larger offset.
Simple arithmetic instructions like ADD, SUB, CMP, AND, ORR all take just 1 cycle to execute. However, some special arithmetic operations take longer:
- Multiply (MUL) – 1 cycle
- Multiply-accumulate (MLA) – 1 cycle
- Divide (SDIV) – 2-12 cycles depending on operands
The hardware multiplier built into the Cortex M0 enables fast 1 cycle multiply operations. But divides take significantly longer depending on the values being divided.
Branch instructions like B, BL, BX take 3 cycles to execute. This includes any taken branches. However, untaken branches only take 1 cycle:
- Taken branch – 3 cycles
- Not taken branch – 1 cycle
The longer taken branch time is due to the pipeline flush and fetch of the new instruction from the branch target address.
Instructions that manipulate the stack like pushes and pops take constant time:
- PUSH/POP single register – 2 cycles
- PUSH/POP multiple registers – 1 cycle per register
So pushing 3 registers would take 1+1+1 = 3 cycles total. The slight overhead per instruction accounts for updating the stack pointer.
Here are some execution times for other common instructions:
- MOV – 1 cycle
- CBZ/CBNZ – 1 cycle (taken branch)
- BLX – 3 cycles
- BX – 3 cycles
Again, simple register moves take just 1 cycle. Flag setting conditional branches like CBZ/CBNZ take 1 cycle if not taken, 3 if taken.
When an interrupt occurs on the Cortex M0, it takes 3 cycles before the first instruction of the interrupt handler executes. This includes stacking the return address and jumping to the handler.
Thus, the total interrupt latency is 3 cycles. Faster interrupt response time enables more real-time task execution.
Cycle Counting Methods
To measure instruction cycle counts on the Cortex M0, you can:
- Use the Cycle Count Register (CCNT) – increments each clock cycle
- Set up Timer in free running mode – acts as cycle counter
- Use debugger to set breakpoint, run, and check CCNT
The CCNT method is good for counting a small sequence of instructions. For larger blocks of code, free running timers or the debugger work better.
Here are some tips for optimizing cycle counts on the Cortex M0 using the instruction timing knowledge:
- Minimize loads and stores by keeping values in registers
- Use shift operations instead of multiples/divides when possible
- Optimize order of operations to minimize stalls
- Tightly loop small blocks of code to reduce branch penalties
- Use conditional execution instead of branches when you can
Optimizing memory access patterns to use mostly register or single cycle loads/stores can provide big speedups. Also minimizing taken branches helps reduce 3 cycle penalties.
Instruction Timing Summary
In summary, the key ARM Cortex M0 instruction execution times are:
- Arithmetic ops – 1 cycle (multiply/divide more)
- Loads/stores – 1-3 cycles by addressing mode
- Branches – 1-3 cycles taken vs not taken
- Stack ops – 1-2 cycles per register
- Interrupts – 3 cycle latency
Understanding these basics helps with writing efficient code for Cortex M0 microcontrollers. Optimizing hot code paths and loops using the timing knowledge is key to maximizing performance. Check the official ARM docs for more specific instruction cycle details.