What are Divide instructions (32-bit quotient) in Arm Cortex-M series?

The Arm Cortex-M series of processor cores are designed for embedded and IoT applications. They feature a 32-bit architecture with support for divide instructions that calculate a 32-bit quotient from a 64-bit dividend. The divide instructions operate on register pairs and can divide signed or unsigned integers.

Contents

Signed Divide in Cortex-M Unsigned Divide in Cortex-M Divide by Zero Quotient Range and Overflow Divide Instruction Encoding Performance and Repeat Instructions Divide in Thumb and Thumb-2 Instruction Sets Software Implementation Division Optimization Techniques Divide Usage Examples Conclusion

Signed Divide in Cortex-M

The SDIV instruction performs a 32-bit by 32-bit signed divide, producing a 32-bit quotient. It divides a 64-bit dividend held in two registers (Rn:Rm) by a 32-bit divisor in another register (Ra). The quotient is written to Rn and the remainder to Rm.

For example: SDIV R0, R1, R2

Divides the 64-bit dividend in R1:R0 by the 32-bit divisor in R2. The 32-bit quotient is written to R0 and the remainder to R1.

The dividend is considered signed. So if bit 31 of R1 is 1, it represents a negative number in 2’s complement form. The divisor can be positive or negative. The result quotient will be rounded towards 0.

Unsigned Divide in Cortex-M

The UDIV instruction does an unsigned 32-bit by 32-bit divide. It divides an unsigned 64-bit dividend (Rn:Rm) by an unsigned 32-bit divisor (Ra). The 32-bit quotient goes to Rn and remainder to Rm.

For example: UDIV R0, R1, R2

Divides the unsigned 64-bit dividend in R1:R0 by the unsigned 32-bit divisor in R2. The 32-bit quotient is written to R0 and remainder to R1.

As the dividend and divisor are considered unsigned, the result quotient will always be rounded down towards 0.

Divide by Zero

An attempt to divide by 0 will result in a divide-by-zero exception. This triggers a call to the configured fault handler routine. The divide instructions are designed to detect a 0 divisor and immediately trap to the handler before any incorrect results are calculated or written.

Quotient Range and Overflow

The 32-bit quotient result overflows if the true mathematical result does not fit in 32 bits. For signed divide, this occurs if the absolute value of the quotient is greater than 2147483647. For unsigned divide, overflow happens if the quotient is larger than 4294967295.

Software should check for potential overflow conditions before executing the divide instruction. This involves checking the magnitude of the dividend and divisor to see if overflow is possible for the operation.

Divide Instruction Encoding

The SDIV and UDIV instructions have similar encoding: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |cond| 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Rm | Ra | Rn |

Where:

cond – Condition field

Rm – Specifies dividend LSB register
Ra – Specifies divisor register
Rn – Specifies dividend MSB register and destination for quotient

The main difference is the setting of bit 28, which is 0 for SDIV and 1 for UDIV.

Performance and Repeat Instructions

Divide instructions have variable latency of 2-12 cycles in Cortex-M cores. So they are slower than simple arithmetic instructions that often execute in 1 cycle.

The Cortex-M4 and M7 include SDIV and UDIV repeat instructions to start a new iterative divide as soon as possible after the previous one. This helps improve performance in code with multiple back-to-back divides. SDIV Rd, Rn, Rm SDIV Rd, Rn, Rm // Repeat SDIV with same registers

Divide in Thumb and Thumb-2 Instruction Sets

The divide instructions are supported in both the Thumb and Thumb-2 instruction sets used by Cortex-M cores:

In Thumb, SDIV and UDIV have 16-bit encodings.
Thumb-2 provides 32-bit encodings that enable the repeat forms of the instructions.

Software Implementation

The SDIV and UDIV instructions are not available in some earlier Arm cores like Cortex-M0. Divide functionality can be implemented in software using an algorithm like successive subtraction.

This approach subtracts the divisor from the dividend repeatedly, tracking the number of subtractions performed. The count gives the final quotient. The remainder can also be calculated.

However software divide routines take significantly more time and code size than using the hardware divide instructions.

Division Optimization Techniques

There are various techniques Cortex-M developers can use to optimize code involving divides:

Utilize hardware divides over software routines when supported
Minimize unnecessary divides through algebraic transformations

Employ shift operations to divide by powers of two instead of general divide
Use divide repeat instructions where possible to improve throughput
Check for potential overflow conditions before divide

Move divides outside tight loops if possible

Proper utilization of the Cortex-M divide instructions and related optimization tactics can produce faster and more efficient code.

Divide Usage Examples

Some examples of using the SDIV and UDIV instructions in Cortex-M code: // Signed divide of 64-bit value in R1:R0 by 32-bit R2 SDIV R0, R1, R2 // Unsigned divide of R1:R0 by 16-bit unsigned value in lower half R2 UXTH R2, R2 UDIV R0, R1, R2 // Signed divide in Thumb instruction set SDIV R3, R5, R7 // Unsigned divide repeat instruction UDIV R1, R3, R8 UDIV R1, R3, R8

Conclusion

The SDIV and UDIV instructions are simple yet powerful additions to the Cortex-M architecture. They enable efficient 32-bit by 32-bit division natively in hardware. Optimizing their usage can result in faster divide throughput compared to software routines. Understanding the divide instructions provides another tool for developers to tap into the performance capabilities of the Cortex-M processor family.

What are Divide instructions (32-bit quotient) in Arm Cortex-M series?

Signed Divide in Cortex-M

Unsigned Divide in Cortex-M

Divide by Zero

Quotient Range and Overflow

Divide Instruction Encoding

Performance and Repeat Instructions

Divide in Thumb and Thumb-2 Instruction Sets

Software Implementation

Division Optimization Techniques

Divide Usage Examples

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Why STM32 is Better Than ESP32?

Arm cortex m0 Dhrystone MIPS

What is watchdog software used for?

Implementing a Round-Robin Scheduler on Cortex-M