SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: How to delay an ARM Cortex M0+ for n cycles, without a timer?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

How to delay an ARM Cortex M0+ for n cycles, without a timer?

Scott Allen
Last updated: October 5, 2023 9:58 am
Scott Allen 6 Min Read
Share
SHARE

The ARM Cortex M0+ is one of the simplest and most basic ARM processor cores, aimed at low-cost and low-power embedded applications. It does not contain advanced features like a memory protection unit, floating point unit, or cache. However, it can still be useful for basic tasks in an embedded system. One common need is to create short delays or wait states in the code, for example when waiting for an external peripheral or sensor. The Cortex M0+ does not contain any built-in timer peripherals, so we need to find another way to create delays.

Contents
Using NOPs for Short DelaysBusy Loop DelaysOptimizing Busy Loop DelaysCycle-Accurate DelaysDelay Routine in Assembly

Using NOPs for Short Delays

One simple way to create a short delay is by inserting NOP (no operation) instructions. The processor will waste cycles executing these NOPs, creating the desired delay. For example:


// Delay approximately 10 cycles 
NOP
NOP
NOP
NOP  
NOP
NOP
NOP
NOP
NOP
NOP

This is very straightforward, but has limited utility since the maximum delay is limited by the size of the code memory. Also, the exact delay depends on the CPU frequency and will not be precise. But for very short delays of a few instructions, NOPs can be useful.

Busy Loop Delays

A more flexible approach is to execute a busy loop for the desired number of cycles. This allows creating longer delays, up to billions of cycles if needed. Here is an example busy loop using a volatile counter variable:


// Delay for approx n cycles
void delay(int n) {
  volatile int i;
  for(i = 0; i < n; i++); 
}

This simple loop iterates n times, wasting cycles, before continuing program execution. The volatile keyword tells the compiler not to optimize away the loop counter. The delay depends on the CPU frequency – at 1 MHz, a 1000 cycle delay would be approximately 1 millisecond.

Optimizing Busy Loop Delays

The busy loop method can be improved in several ways:

  • Use an unsigned integer for the counter – this increases the maximum delay
  • Initialize counter variable outside loop – avoids overhead each iteration
  • Use nested loops to increase maximum delay
  • Unroll inner loop iterations for better performance

Here is an optimized busy loop approach:


// Higher max delay, reduced overhead 
void delay(uint32_t n) {
  uint32_t i;

  // Init counter
  i = n; 

  // Outer loop
  while(i > 0) {

    // Unrolled inner loop
    asm volatile(
      "nop\n\t"
      "nop\n\t"
      "nop\n\t"
      "nop\n\t"
      "sub %0, #1\n\t"  
      : "+r" (i)
    );
  }
}  

This delays approximately n CPU cycles. With the 32-bit counter, the maximum single delay is about 4.29 billion cycles, or 71 minutes at 1 MHz. The unrolled inner loop reduces loop overhead.

Cycle-Accurate Delays

The previous busy loop methods provide delays in terms of CPU clock cycles. However, they do not account for the actual cycles used by each loop iteration. For example, the inner loop may take 5 cycles instead of the expected 1 cycle per loop. This means the delay will be 5x shorter than expected.

To create truly cycle-accurate delays, we need to measure and compensate for the overhead cycles used by the loop logic itself. This can be done by calibrating the delay loop on the target system:

  1. Initialize loop counter and start timer
  2. Execute busy loop for n iterations
  3. Stop timer and check elapsed cycles
  4. Calculate overhead cycles per loop iteration
  5. Compensate delays using calculated overhead

Here is example code to perform this calibration process:


// Calibrate delay loop
void calibrate_delay() {

  // Known n loop iterations
  int n = 1000;

  // Start cycle counter
  start_cycle_count(); 

  // Execute test loop
  for(int i = 0; i < n; i++) {
    asm volatile(
      "nop\n\t"
    );
  }

  // Stop cycle counter
  uint32_t elapsed = stop_cycle_count();

  // Overhead per loop
  uint32_t overhead = elapsed / n;

  // Compensate delays using overhead
  delay_cycles = overhead * delay_iterations;  
}

By measuring the actual elapsed cycles for a fixed number of loop iterations, we can calculate the per-loop overhead cycles. This overhead is then used to compensate when creating delays, providing higher accuracy.

Delay Routine in Assembly

For ultimate performance and flexibility, the delay loop can be hand-coded in assembly language. This allows full control over the loop behavior and overhead.

Here is an example delay routine in ARM Thumb assembly:


.global delay    

.thumb_func
delay:

  // Counter in r0
  movs r1, #0

loop:
  nop
  sub r0, #1
  cmp r0, r1
  bne loop

  bx lr

The delay parameter is passed in register r0. The loop is very tight, with only 1 NOP instruction inside the loop. The overhead is only a few cycles, enabling very precise delays. Maximum delay depends on counter size – 32-bit allows over 4 billion cycles.

In summary, several techniques exist for creating delays on the Cortex M0+:

  • NOP instructions for very short delays
  • Busy loop written in C, adjustable duration
  • Busy loop in assembly for high performance
  • Calibrating loops for maximum accuracy

Delays allow simple integration of wait states into code flow for peripherals, sensors, or other external events. With care, delays of nanosecond precision are possible.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Where is arm cortex-M0 used?
Next Article Cortex M0+ delay routine without timers
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

What are the key characteristics of ARM Cortex M0?

The ARM Cortex-M0 is a 32-bit processor designed for low-power…

10 Min Read

Invalid ROM Table Errors with Cortex-M1 and ULINK2

Seeing "Invalid ROM Table" errors when trying to debug Cortex-M1…

7 Min Read

Force get access to Cortex-M0 if SWDIO is disabled on startup Cortex M0

The Cortex-M0 is an ARM processor targeted at microcontroller applications.…

6 Min Read

What is the memory and bus architecture of the Cortex-M3?

The Cortex-M3 is a 32-bit microcontroller developed by ARM Holdings.…

8 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account