The ARM Cortex-M0+ processor is an ultra low power 32-bit RISC CPU core designed for microcontroller applications. It is part of ARM’s Cortex-M series of cores, and is their smallest and most energy efficient implementation, making it well suited for IoT edge devices and other battery powered applications where cost and power consumption are critical factors.
Like most microcontroller CPUs, the Cortex-M0+ does not have dedicated hardware to perform integer division operations. Instead, it must synthesize divisions in software using repeated subtraction or bit shifting. This article will explain how integer division is implemented on the Cortex-M0+ and provide some example code.
Integer Division Background
Integer division is the process of dividing two integer values to produce an integer quotient. For example: 15 / 4 = 3 remainder 3
The quotient is the integer result of the division (3 in this case), while the remainder is what is left over (3).
Division can be implemented in hardware using circuitry that shifts and subtracts iteratively. Without dedicated hardware, division must be synthesized in software via an algorithm. The two main methods are:
- Repeated Subtraction: Repeatedly subtract the divisor from the dividend until zero is reached.
- Bit Shifting: Shift and subtract based on the divisor’s bits.
For resource constrained microcontrollers like Cortex-M0+, the bit shifting approach is generally preferred as it has better performance in most cases.
Division Implementation on Cortex-M0+
The Cortex-M0+ does not include integer division instructions in its Thumb instruction set. Therefore, division must be implemented as a software routine. For better performance, this routine can be written in assembly language rather than C/C++.
ARM provides an example integer divide routine for Cortex-M0+ in its CMSIS software framework. The routine implements the bit shifting division approach. It takes the dividend and divisor as inputs and returns the quotient and remainder.
Here is a walkthrough of how it works:
- Arguments are loaded into registers, with divisor in a specific register expected by hardware.
- The agorithm bit shifts the divisor left, subtracting from dividend each time top bit is 1.
- It keeps track of number of shifts, which becomes the final quotient.
- Remainder is what is left in dividend after all shifts.
- Quotient and remainder are returned in expected registers.
This algorithm is efficient for the Cortex-M0+ as it leverages the processor’s barrel shifter unit to perform rapid bit shift operations.
Example CMSIS Division Code
Here is abbreviated example code from ARM’s CMSIS library showing the Cortex-M0+ division routine: / * r0 = divisor, r1 = dividend * / udiv: // load r0, r1 // configure hardware for udiv stepping loop: // left shift r0 // if overflow, subtract divisor from dividend // keep track of shift count // repeat until divisor >= dividend done: // quotient = shift count // remainder = r1 // return quotient/remainder in expected registers
This demonstrates the essential bit shifting approach. The full source has additional optimizations and configurations specific to the Cortex-M0+ hardware.
Calling the Division Routine
The CMSIS division routine can be called from C/C++ code. For example: uint32_t quotient, remainder; uint32_t divisor = 4; uint32_t dividend = 15; quotient = __aeabi_uidiv(dividend, divisor); remainder = __aeabi_uidivmod(dividend, divisor);
This performs unsigned integer division and division with remainder. The __aeabi functions are CMSIS routines matched to the Cortex-M0+ hardware.
Division By Constant
One optimization is to use bit shifts when dividing by a constant divisor. For example: // Integer divide by 10 uint32_t val = 100; uint32_t divBy10 = val >> 3;
This takes advantage of the fact that dividing by 10 is equivalent to shifting right by 3 bits. The compiler can utilize similar optimizations when it detects division by a constant.
Division Behavior
It is important to understand Cortex-M0+ division behavior for different data types and operands:
- Signed vs Unsigned: Results will differ based on type of operands.
- Truncation: Quotient is truncated towards zero.
- Zero Divisor: Will generate a fault/exception.
Code that performs division should account for these behaviors. Using the CMSIS routines will automatically handle some of these nuances.
Division vs Multiplication
Integer multiplication is generally faster than division on the Cortex-M0+, since the hardware includes a multiplier unit but not divider. However, both are much slower than addition, subtraction, shifting, etc. It is best to minimize divisions in performance critical code.
Summary
Key points on Cortex-M0+ integer division:
- Implemented in software using bit shifting approach.
- CMSIS provides optimal routines leveraging hardware.
- Synthesized with repeated shifts and subtracts of divisor.
- Quotient obtained from shift count, remainder from result.
- Can optimize for constant divisors.
- Understand division behavior for data types.
- Minimize divisions for best performance.
Consult the CMSIS library documentation for more details on the division routines. Proper utilization will enable efficient integer math on the Cortex-M0+ and other ARM cores.