The Arm Cortex-M series of processor cores are designed for embedded and IoT applications. They feature a 32-bit architecture with support for divide instructions that calculate a 32-bit quotient from a 64-bit dividend. The divide instructions operate on register pairs and can divide signed or unsigned integers.

## Signed Divide in Cortex-M

The SDIV instruction performs a 32-bit by 32-bit signed divide, producing a 32-bit quotient. It divides a 64-bit dividend held in two registers (Rn:Rm) by a 32-bit divisor in another register (Ra). The quotient is written to Rn and the remainder to Rm.

For example: SDIV R0, R1, R2

Divides the 64-bit dividend in R1:R0 by the 32-bit divisor in R2. The 32-bit quotient is written to R0 and the remainder to R1.

The dividend is considered signed. So if bit 31 of R1 is 1, it represents a negative number in 2’s complement form. The divisor can be positive or negative. The result quotient will be rounded towards 0.

## Unsigned Divide in Cortex-M

The UDIV instruction does an unsigned 32-bit by 32-bit divide. It divides an unsigned 64-bit dividend (Rn:Rm) by an unsigned 32-bit divisor (Ra). The 32-bit quotient goes to Rn and remainder to Rm.

For example: UDIV R0, R1, R2

Divides the unsigned 64-bit dividend in R1:R0 by the unsigned 32-bit divisor in R2. The 32-bit quotient is written to R0 and remainder to R1.

As the dividend and divisor are considered unsigned, the result quotient will always be rounded down towards 0.

## Divide by Zero

An attempt to divide by 0 will result in a divide-by-zero exception. This triggers a call to the configured fault handler routine. The divide instructions are designed to detect a 0 divisor and immediately trap to the handler before any incorrect results are calculated or written.

## Quotient Range and Overflow

The 32-bit quotient result overflows if the true mathematical result does not fit in 32 bits. For signed divide, this occurs if the absolute value of the quotient is greater than 2147483647. For unsigned divide, overflow happens if the quotient is larger than 4294967295.

Software should check for potential overflow conditions before executing the divide instruction. This involves checking the magnitude of the dividend and divisor to see if overflow is possible for the operation.

## Divide Instruction Encoding

The SDIV and UDIV instructions have similar encoding: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |cond| 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Rm | Ra | Rn |

Where:

- cond – Condition field
- Rm – Specifies dividend LSB register
- Ra – Specifies divisor register
- Rn – Specifies dividend MSB register and destination for quotient

The main difference is the setting of bit 28, which is 0 for SDIV and 1 for UDIV.

## Performance and Repeat Instructions

Divide instructions have variable latency of 2-12 cycles in Cortex-M cores. So they are slower than simple arithmetic instructions that often execute in 1 cycle.

The Cortex-M4 and M7 include SDIV and UDIV repeat instructions to start a new iterative divide as soon as possible after the previous one. This helps improve performance in code with multiple back-to-back divides. SDIV Rd, Rn, Rm SDIV Rd, Rn, Rm // Repeat SDIV with same registers

## Divide in Thumb and Thumb-2 Instruction Sets

The divide instructions are supported in both the Thumb and Thumb-2 instruction sets used by Cortex-M cores:

- In Thumb, SDIV and UDIV have 16-bit encodings.
- Thumb-2 provides 32-bit encodings that enable the repeat forms of the instructions.

## Software Implementation

The SDIV and UDIV instructions are not available in some earlier Arm cores like Cortex-M0. Divide functionality can be implemented in software using an algorithm like successive subtraction.

This approach subtracts the divisor from the dividend repeatedly, tracking the number of subtractions performed. The count gives the final quotient. The remainder can also be calculated.

However software divide routines take significantly more time and code size than using the hardware divide instructions.

## Division Optimization Techniques

There are various techniques Cortex-M developers can use to optimize code involving divides:

- Utilize hardware divides over software routines when supported
- Minimize unnecessary divides through algebraic transformations
- Employ shift operations to divide by powers of two instead of general divide
- Use divide repeat instructions where possible to improve throughput
- Check for potential overflow conditions before divide
- Move divides outside tight loops if possible

Proper utilization of the Cortex-M divide instructions and related optimization tactics can produce faster and more efficient code.

## Divide Usage Examples

Some examples of using the SDIV and UDIV instructions in Cortex-M code: // Signed divide of 64-bit value in R1:R0 by 32-bit R2 SDIV R0, R1, R2 // Unsigned divide of R1:R0 by 16-bit unsigned value in lower half R2 UXTH R2, R2 UDIV R0, R1, R2 // Signed divide in Thumb instruction set SDIV R3, R5, R7 // Unsigned divide repeat instruction UDIV R1, R3, R8 UDIV R1, R3, R8

## Conclusion

The SDIV and UDIV instructions are simple yet powerful additions to the Cortex-M architecture. They enable efficient 32-bit by 32-bit division natively in hardware. Optimizing their usage can result in faster divide throughput compared to software routines. Understanding the divide instructions provides another tool for developers to tap into the performance capabilities of the Cortex-M processor family.