Cortex M0+ delay routine without timers

Executing delays and timing operations are common needs in embedded systems programming. The Cortex-M0+ is one of ARM’s most widely used microcontroller cores, known for its low cost and power consumption. However, the Cortex-M0+ does not include any timers by default. This presents a challenge for developers looking to implement delays or timing functions on the Cortex-M0+.

Contents

CPU Cycle Count Nested Loops Non-Busy Delays Blocking Delay Routines Achieving Accurate Delays Example Code Conclusion

Fortunately, there are techniques that allow implementing delays on the Cortex-M0+ even without hardware timers. By using the CPU cycle count and/or nested loops, reasonably accurate software-based delays can be achieved. In this article, we will explore different methods for implementing delay routines on the Cortex-M0+ without relying on any timers.

CPU Cycle Count

The first approach is to use the CPU cycle count. The Cortex-M0+ includes a 24-bit SysTick counter that counts the number of CPU cycles since reset. This counter can be used to create delays by executing a busy loop that runs for a fixed number of cycles.

To use the cycle count for delays:

Enable the SysTick counter by writing to the SysTick Control and Status Register (SYST_CSR).
Read the start value of the SysTick Current Value Register (SYST_CVR). This gives the current cycle count value.

Execute a busy loop for the required number of cycles.
Read the SysTick CVR again to determine when the required number of cycles has elapsed.

Inside the loop, you can execute NOPs or other safe instructions that do not affect application state. Subtracting the start and end cycle count values gives the actual number of cycles spent in the loop. By tuning the loop count, you can achieve the desired delay period.

However, there are some limitations to this approach:

The 24-bit SysTick counter will overflow frequently at higher CPU speeds. This needs to be accounted for.
Busy looping is not power efficient as it keeps the CPU active for the entire delay period.

The delay duration depends on CPU speed. Changing the clock speed will impact the delays unless loop counts are adjusted.

Overall, the cycle count approach provides simple delays without requiring any hardware timers. With careful calibration, delays from microseconds to seconds are feasible. But the limitations mean it may not be suitable for all applications.

Nested Loops

An alternative to the cycle count is to create delays using nested loops. This simply involves an outer loop that runs a variable number of times, with an inner loop providing a fixed delay period. By tuning the outer loop count, different overall delay durations can be achieved.

For example:


// Inner loop provides approx 10 us delay
void delay_10us(void) {
  for(int i = 0; i < DELAY_10US_COUNT; i++) { 
    nop();
  }
}

// Outer loop calls inner loop to implement desired delays  
void delay_xms(int ms) {
  for(int i = 0; i < ms; i++) {
    delay_10us(); // 10us inner loop
  }
}

The inner delay loop is calibrated to provide a small unit delay, like 10 μs as shown above. The outer loop then runs multiple iterations of the inner loop to achieve longer delays. The benefit of this approach is that the outer loop count gives a linear and intuitive control over the delay duration.

Nested loops have some pros and cons compared to the cycle count method:

Don’t require SysTick configuration or any special hardware.
Delay duration is not dependent on CPU speed.
Easy to create longer delays by changing the outer loop count.

Still uses busy looping so not power efficient.
Inner loop calibration needs careful characterization.

Overall, nested loops provide a lightweight way to create multiple software delays. With careful inner loop calibration across operating conditions like voltage and temperature, reasonably accurate delays are possible.

Non-Busy Delays

The previous two methods rely on busy looping within the delay routine. This keeps the CPU actively spinning and consumes more power. For low-power applications, an alternative is to use sleep modes instead of busy loops:


void sleep_delay_ms(int ms)
{
  for(int i = 0; i < ms; i++) {
    __WFI(); // Enter sleep mode
    // Sleeps for approx 1ms
  }
}

Here instead of a busy loop, we enter a low-power sleep mode using the __WFI() intrinsic. This puts the Cortex-M0+ CPU into sleep until an interrupt occurs. By sleeping for short intervals in a loop, delays can be generated without constantly running the CPU.

The downside is that an independent timer or interrupt source is needed to wake up from sleep at the desired intervals. This slightly increases hardware complexity. Also, wake-up from sleep may have some variable latency which impacts delay accuracy. But overall, sleep delays are a good option for low-power and battery-operated devices.

Blocking Delay Routines

The delays discussed above are all blocking, meaning they pause program execution on the CPU for the entire delay duration. The CPU is not available to run other tasks during this time. In some cases, this simple blocking behavior is sufficient.

However, for more complex programs with multiple concurrent tasks, non-blocking delays may be preferred. These allow delay operations to be started but do not occupy the CPU for the full delay period. Non-blocking delays are created using the following approaches:

Timers – Hardware timers run independently of the CPU and can trigger interrupts on expiry for implementing non-blocking delays.

RTOS delays – Using RTOS apis like vTaskDelayUntil can suspend a task until a fixed time in the future.
Thread sleep – In multi-threaded firmware, threads can sleep to create a delay without blocking the entire program.

With additional software complexity, these methods allow implementing delays while still remaining responsive to other events in the system. The trade-off is between simplicity of blocking delays and flexibility of non-blocking delays.

Achieving Accurate Delays

The accuracy of software-based delays can be affected by several factors:

Clock frequency – Higher clocks speed up instruction execution leading to shorter delays. So delays must be tuned accordingly.
Optimization – Compiler settings like -O3 can heavily optimize nested loops and alter delays.

Interrupt latency – Unpredictable ISR overhead affects delay accuracy.
Environment – Temperature and voltage variation impacts instruction timing.

Some ways to improve delay accuracy include:

Characterize delays across the expected operating conditions.
Use compiler pragmas like #pragma optimize=low to limit optimizations.
Calibrate delays and allow software tuning and calibration.

For critical delays, use hardware timers if possible.

Reasonable accuracy can be obtained across voltage, temperature and some clock speed changes with careful characterization. But hardware timers are recommended for very precise delays in critical timing applications.

Example Code

Here is some sample code that demonstrates different Cortex-M0+ delay routines without timers:


// SysTick cycle count delay

void delay_cycles(uint32_t cycles)
{
  uint32_t start, end; 
  start = SYST_CVR; // Get start cycle count

  while(1)
  {
    end = SYST_CVR; // Read current cycle count
    if((end - start) >= cycles) break; // Check for elapsed cycles
  }
}

// Nested loop delay

void delay_inner(void)
{
  for(int i = 0; i < INNER_COUNT; i++) // Calibrated inner loop
  {
    nop();
  }
}  

void delay_ms(int ms)
{
  for(int i = 0; i < ms; i++)
  {
    delay_inner(); // Inner nested loop
  }
}

// Non-busy sleep delay 

void sleep_delay(int ms)
{
  for(int i = 0; i < ms; i++)
  {
     __WFI(); // Sleep until interrupt
  }
}

These examples demonstrate the core concepts discussed in this article for implementing blocking delays on Cortex-M0+ without hardware timers. The actual delay duration and accuracy will depend on the target system and require tuning of the loop counts and sleep intervals accordingly.

Conclusion

Delays and timing functions are critical for embedded and IoT applications. The lack of timers on Cortex-M0+ means software-based delay routines are essential. By utilizing the SysTick cycle counter, nested loops, and sleep modes, reasonably accurate delays can be implemented without hardware timers. Careful calibration is required to account for CPU speed, optimizations, and other factors affecting delay duration. Overall, the techniques covered in this article should provide embedded developers with a toolkit of software delay options for systems based on the low-cost Cortex-M0+ MCU.

Cortex M0+ delay routine without timers

CPU Cycle Count

Nested Loops

Non-Busy Delays

Blocking Delay Routines

Achieving Accurate Delays

Example Code

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

How much memory does the Cortex-M0 have?

Inline assembly in C code for Cortex-M0/M0+

What is the instruction set used by ARM Cortex M3?

Dangers of Using Bit Banding for Peripheral Register Access in ARM Cortex M3