Integer division on ARM processors is done using the SDIV and UDIV instructions for signed and unsigned division respectively. Here is a detailed guide on how integer division works on ARM and how to utilize the division instructions.
Overview of Integer Division on ARM
On ARM processors, integer division is done in the Arithmetic Logic Unit (ALU) which contains circuitry to perform basic arithmetic and logical operations. When a division instruction is executed, the two integer operands are passed to the ALU which then performs the division operation on them.
ARM has two division instructions:
- SDIV: Signed Divide – Performs signed integer division, taking into account the sign of the operands.
- UDIV: Unsigned Divide – Performs unsigned integer division, treating the operands as positive values.
Both instructions divide a 32-bit operand (the numerator) by a 32-bit operand (the denominator) and return a 32-bit result. The remainder of the division is discarded.
Some key points regarding integer division on ARM:
- The execution time of SDIV and UDIV is variable and depends on the values of the operands. More cycles are required for larger operands.
- Division by zero leads to undefined behavior on ARM. This should always be avoided through proper input validation.
- The ARM architecture uses restoring division to perform SDIV and UDIV. This algorithm produces one bit of the result per iteration.
- Many ARM processors do not have a hardware divider and use a software library routine for division. This leads to very high division latencies.
- Some ARM processors like Cortex-A series have an integer divider unit for improved performance. But division is still slower than other arithmetic operations.
Signed Division (SDIV)
The SDIV instruction performs signed integer division, taking into account the signs of the two operands. The instruction divides the 32-bit numerator (N) by the 32-bit denominator (D) and returns a 32-bit result (R).
The pseudocode for SDIV is: if D == 0 then UNPREDICTABLE; if (N == INT_MIN) and (D == -1) then R = INT_MIN; else R = N / D;
The key behaviors are:
- If the denominator is 0, the result is unpredictable (typically traps).
- If N is -2^31 and D is -1, ignoring signs causes overflow. So INT_MIN / -1 = INT_MIN is defined.
- In other cases, a standard signed division is performed while preserving sign.
For positive operands, SDIV works like unsigned division. But when the signs of N and D differ, the result sign follows these rules:
- If N >= 0, R >= 0
- If D >= 0, R <= 0
Some examples: SDIV R0, R1, R2 R1 = 15, R2 = 4 -> R0 = 3 R1 = -15, R2 = 4 -> R0 = -3 R1 = 4, R2 = -3 -> R0 = -1 R1 = -4, R2 = -3 -> R0 = 1 R1 = -2147483648, R2 = -1 -> R0 = -2147483648 (INT_MIN / -1)
Unsigned Division (UDIV)
The UDIV instruction performs unsigned integer division, treating both operands as positive values. As with SDIV, the 32-bit numerator (N) is divided by the 32-bit denominator (D) to produce a 32-bit result (R).
The pseudocode for UDIV is: if D == 0 then UNPREDICTABLE; R = N / D;
Again, divide by 0 leads to undefined behavior. For non-zero D, standard unsigned division is performed, equivalent to: R = floor(N / D)
Some examples: UDIV R0, R1, R2 R1 = 15, R2 = 4 -> R0 = 3 R1 = 4, R2 = 15 -> R0 = 0 R1 = 0x80000000, R2 = 0x40000000 -> R0 = 0x2
Performing Division in ARM Assembly
To perform integer division in ARM Assembly, the SDIV and UDIV instructions are used. Here are some examples: // Signed division MOV R0, #15 MOV R1, #4 SDIV R2, R0, R1 // R2 = 15 / 4 = 3 // Unsigned division MOV R3, #0x80000000 MOV R4, #0x40000000 UDIV R5, R3, R4 // R5 = 0x20000000
Some key points on using SDIV/UDIV in assembly:
- Operands must be in registers, not immediates.
- Results go to a register, not memory.
- Remainder is discarded, just the quotient is produced.
- Division flags like overflow/carry are not set by SDIV/UDIV.
- Slower than other arithmetic ops, may take over 10 cycles.
Improving Division Performance on ARM
Integer division is generally costly on ARM processors. Here are some techniques to improve division performance in ARM programs:
- Use architecture with hardware divide – Cortex-A series is faster for division compared to Cortex-M.
- Replace divisions by constants with multiplications – e.g. x/4 => x>>2.
- Use reciprocal multiplication instead of division where possible.
- Avoid divisions inside tight loops. Try moving them outside the loop.
- unroll/paralleize loops with multiple divisions.
- Use optimized division routines from CMSIS libraries for Cortex-M.
- If integer-only is okay, use fixed-point math instead of floating point division.
- For compilers, use speed optimization flags like -O3 to enhance division code.
Doing Division in C/C++ on ARM
For C/C++ code running on ARM, integer division can be performed using the / and % operators, along with typecasting.
Examples: int a = 15, b = 4; int c = a / b; // Signed division 15 / 4 unsigned int x = 0x80000000, y = 0x40000000; unsigned int z = x / y; // Unsigned division int p = -15; int q = 4; int r = p / q; // Signed division -15 / 4 = -3 int s = -2147483648 / -1; // Handle INT_MIN / -1 properly
The %= operator is used to obtain the remainder. And typecasting can be used to specify signed/unsigned division: int i = 15 % 4; // Remainder = 3 unsigned u = -1; int s = (int)u; // Typecast to signed
The compiler handles selecting SDIV or UDIV based on the operand types. Optimization flags like -O3 can be used to enhance division performance.
Handling Division by Zero
Attempting to divide by zero with SDIV or UDIV leads to undefined behavior on ARM. This could result in a crash, exception, or other unexpected results.
To avoid this, input validation is necessary in code that performs division. Here are some ways to check for zero before dividing in C/C++ on ARM: if(y == 0) { // Handle divide by zero } else { x / y; // Proceed with divide } // Use ternary operator z = (y == 0) ? 0 : x / y; // Use preprocessor macro #define DIV(x, y) ((y) == 0 ? 0 : (x) / (y)) int r = DIV(x, y);
The ARM interrupt controller can also be configured to trigger an exception on divide by zero. The IDE/debugger can break execution when the exception occurs.
Division Operations in ARM NEON
ARM NEON is a SIMD instruction set extension for ARM Cortex-A series processors. It provides instructions to perform division on multiple values simultaneously.
NEON provides vector signed and unsigned divide instructions:
- VDIV – Divide elements of two vectors.
- VDIVQ – Divide quadword vectors.
For example: int32x4_t num = {15, -15, 4, -4}; int32x4_t den = {4, 4, -3, -3}; int32x4_t res = vdivq_s32(num, den); // res = {3, -3, -1, 1}
NEON division uses the integer division unit and has lower throughput than multiplication. But it enables parallel division of multiple values, accelerating workflows like video processing.
Summary
Here are the key points on integer division in ARM processors:
- SDIV and UDIV instructions are used for signed and unsigned division.
- Division takes multiple cycles and is slower than other arithmetic.
- Hardware divide units in Cortex-A series improve performance.
- Divide by zero leads to undefined behavior and must be avoided.
- Techniques like loop unrolling, avoiding division, and using NEON help increase speed.
- C/C++ relies on the compiler to use SDIV/UDIV based on operand types.
- Input validation is necessary to watch for divide by zero.
Understanding the division instructions, behavior, and performance is key to utilizing this operation efficiently in ARM programs.