SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: ARM Cortex M0 Instruction Execution Time
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

ARM Cortex M0 Instruction Execution Time

Graham Kruk
Last updated: October 5, 2023 9:58 am
Graham Kruk 5 Min Read
Share
SHARE

The ARM Cortex M0 is a 32-bit processor core designed for microcontroller applications. It is one of the smallest and lowest power ARM processor cores available, making it well-suited for IoT and wearable devices. Understanding the instruction execution times for the Cortex M0 is important for optimizing performance in time-critical applications.

Contents
Load/Store InstructionsArithmetic InstructionsBranch InstructionsStack OperationsOther InstructionsInterrupt LatencyCycle Counting MethodsOptimization TipsInstruction Timing Summary

In general, most instructions on the Cortex M0 take just a single clock cycle to execute. This includes simple arithmetic, logical, and data transfer instructions. However, some instructions like multiplies, divides, and loads/stores can take multiple cycles.

Load/Store Instructions

Load and store instructions access memory and vary in execution time depending on the addressing mode:

  • Register offset addressing – 1 cycle
  • Immediate offset addressing – 2 cycles
  • Absolute addressing – 3 cycles

For example, LDR R1, [R2, #8] takes 1 cycle to load from the address in R2 offset by 8 bytes. Whereas LDR R1, [R2, #256] takes 2 cycles due to the larger offset.

Arithmetic Instructions

Simple arithmetic instructions like ADD, SUB, CMP, AND, ORR all take just 1 cycle to execute. However, some special arithmetic operations take longer:

  • Multiply (MUL) – 1 cycle
  • Multiply-accumulate (MLA) – 1 cycle
  • Divide (SDIV) – 2-12 cycles depending on operands

The hardware multiplier built into the Cortex M0 enables fast 1 cycle multiply operations. But divides take significantly longer depending on the values being divided.

Branch Instructions

Branch instructions like B, BL, BX take 3 cycles to execute. This includes any taken branches. However, untaken branches only take 1 cycle:

  • Taken branch – 3 cycles
  • Not taken branch – 1 cycle

The longer taken branch time is due to the pipeline flush and fetch of the new instruction from the branch target address.

Stack Operations

Instructions that manipulate the stack like pushes and pops take constant time:

  • PUSH/POP single register – 2 cycles
  • PUSH/POP multiple registers – 1 cycle per register

So pushing 3 registers would take 1+1+1 = 3 cycles total. The slight overhead per instruction accounts for updating the stack pointer.

Other Instructions

Here are some execution times for other common instructions:

  • MOV – 1 cycle
  • CBZ/CBNZ – 1 cycle (taken branch)
  • BLX – 3 cycles
  • BX – 3 cycles

Again, simple register moves take just 1 cycle. Flag setting conditional branches like CBZ/CBNZ take 1 cycle if not taken, 3 if taken.

Interrupt Latency

When an interrupt occurs on the Cortex M0, it takes 3 cycles before the first instruction of the interrupt handler executes. This includes stacking the return address and jumping to the handler.

Thus, the total interrupt latency is 3 cycles. Faster interrupt response time enables more real-time task execution.

Cycle Counting Methods

To measure instruction cycle counts on the Cortex M0, you can:

  • Use the Cycle Count Register (CCNT) – increments each clock cycle
  • Set up Timer in free running mode – acts as cycle counter
  • Use debugger to set breakpoint, run, and check CCNT

The CCNT method is good for counting a small sequence of instructions. For larger blocks of code, free running timers or the debugger work better.

Optimization Tips

Here are some tips for optimizing cycle counts on the Cortex M0 using the instruction timing knowledge:

  • Minimize loads and stores by keeping values in registers
  • Use shift operations instead of multiples/divides when possible
  • Optimize order of operations to minimize stalls
  • Tightly loop small blocks of code to reduce branch penalties
  • Use conditional execution instead of branches when you can

Optimizing memory access patterns to use mostly register or single cycle loads/stores can provide big speedups. Also minimizing taken branches helps reduce 3 cycle penalties.

Instruction Timing Summary

In summary, the key ARM Cortex M0 instruction execution times are:

  • Arithmetic ops – 1 cycle (multiply/divide more)
  • Loads/stores – 1-3 cycles by addressing mode
  • Branches – 1-3 cycles taken vs not taken
  • Stack ops – 1-2 cycles per register
  • Interrupts – 3 cycle latency

Understanding these basics helps with writing efficient code for Cortex M0 microcontrollers. Optimizing hot code paths and loops using the timing knowledge is key to maximizing performance. Check the official ARM docs for more specific instruction cycle details.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article ARM Cortex M Boot Process
Next Article ARM Embedded Application Binary Interface
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Things to Check When Cortex-M1 Enters Hard Fault Early On

When the Cortex-M1 processor encounters a fatal error early in…

9 Min Read

What is the Difference Between ARM Cortex M3 and M0?

The ARM Cortex-M3 and Cortex-M0 are two popular microcontroller cores…

8 Min Read

Tips for Debugging ARM Cortex-M3 with OpenOCD and GDB

Debugging ARM Cortex-M3 with OpenOCD and GDB can seem daunting…

15 Min Read

Understanding IDCODE values returned by Cortex debug ports

The IDCODE is a 32-bit code that provides details about…

8 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account