SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: Cortex-M0 Multiply Cycles
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

Cortex-M0 Multiply Cycles

Javier Massey
Last updated: September 15, 2023 8:24 am
Javier Massey 6 Min Read
Share
SHARE

The Cortex-M0 is an ultra low power 32-bit ARM processor core designed for microcontroller applications. It is optimized to achieve high performance and energy efficiency in embedded systems that require minimal silicon area. One of the key features of the Cortex-M0 is its high speed integer multiplier which can perform a 32×32 multiply in a single cycle.

Contents
Cortex-M0 Architecture OverviewInteger MultiplierMultiply InstructionSigned and Unsigned BehaviorSigned Multiply BehaviorOverflow DetectionMultiply-AccumulateMultiply-SubtractLong MultipliesSummary of Multiply Cycles

Cortex-M0 Architecture Overview

The Cortex-M0 is a 3-stage scalar pipeline processor with dual 16-bit multiply accumulate (MAC) hardware. It has a 32-bit ALU, 32-bit multiplier, barrel shifter, bit-banding, and saturating arithmetic logic. The processor includes 32KB to 64KB of embedded SRAM which serves as tight coupled memory for code and data. The Cortex-M0 implements the ARMv6-M Thumb instruction set which includes both 16-bit and 32-bit instructions.

Integer Multiplier

The integer multiplier in the Cortex-M0 is fully pipelined and can perform a 32×32 multiply in a single cycle with no stalls. It supports multiply, multiply-accumulate, and multiply-subtract operations on 8-bit, 16-bit and 32-bit operands. The multiplier takes in two 32-bit operands and produces a 32-bit result which is written to a dedicated 32-bit product register. This enables back-to-back multiply operations without delays.

Multiply Instruction

The MUL instruction performs an unsigned 32×32 multiply of two register operands and stores the result in a destination register. The syntax is: MUL{S} {Rd,} Rn, Rm

Where:

  • S (optional) – Update status flags
  • Rd – Destination register for result
  • Rn – First operand
  • Rm – Second operand

For example: MUL R1, R2, R3 ; R1 = R2 * R3

This multiplies the unsigned values in R2 and R3 and stores the result in R1. The flags are not updated. Since the Cortex-M0 multiplier is pipelined, this MUL takes just 1 cycle to execute regardless of the operand values.

Signed and Unsigned Behavior

The MUL instruction always performs an unsigned integer multiply. However, the result can be interpreted as signed or unsigned depending on the instructions that use it. For example: MUL R1, R2, R3 ; Unsigned multiply CMP R1, #0 ; Compare R1 against 0

This will treat R1 as an unsigned 32-bit value for the comparison. But if we do: MUL R1, R2, R3 CMN R1, #1 ; Compare negative R1 against -1

Then R1 is treated as a signed 2’s complement value. So the same MUL result can be used in both signed and unsigned contexts.

Signed Multiply Behavior

When using the MUL result in a signed context, it correctly implements 2’s complement signed multiplication. This means:

  • Negative numbers are represented in 2’s complement form
  • The sign bit is extended into the upper bits during multiply
  • The signed result is modulo 2^32

For example: MUL R1, #0x80000000, R2 ; R1 = -2147483648 * R2

This will properly sign extend the first operand and store the correct signed result in R1.

Overflow Detection

The MUL instruction does not set overflow or carry flags itself. However, overflow can be detected by checking the carry out of bit 31 of the result: MULS R1, R2, R3 ; Signed multiply MOVS R0, R1 ; Copy R1 to R0, setting flags BCS overflow ; Branch if carry set (bit 31 carry)

The carry will be set if bit 31 of R1 is not the sign bit of the true mathematical result. This indicates an overflow.

Multiply-Accumulate

The Cortex-M0 supports fused multiply-accumulate operations with the MLA instruction: MLA{S} {Rd,} Rn, Rm, Ra

This multiplies Rn and Rm, adds the accumulate value Ra, and stores the result in Rd. For example: MLA R1, R2, R3, R4 ; R1 = R2 * R3 + R4

This does the multiply and accumulate in 1 cycle with no stalls. Overflow can be detected by checking the carry flag as with a normal MUL instruction.

Multiply-Subtract

Similarly, the MLS instruction does a fused multiply-subtract operation: MLS{S} {Rd,} Rn, Rm, Ra

This multiplies Rn and Rm, subtracts Ra from the product, and stores the result in Rd. For example: MLS R1, R2, R3, R4 ; R1 = R2 * R3 – R4

This multiply-subtract takes just 1 cycle on the Cortex-M0.

Long Multiplies

The SMULL and UMULL instructions can perform long multiplies to produce 64-bit results: SMULL RdLo, RdHi, Rn, Rm UMULL RdLo, RdHi, Rn, Rm

This multiplies Rn and Rm as signed or unsigned 32-bit values. The lower 32-bits of the 64-bit result are stored in RdLo and the upper 32-bits are stored in RdHi. For example: SMULL R0, R1, R2, R3 ; Signed long multiply UMULL R0, R1, R2, R3 ; Unsigned long multiply

On the Cortex-M0, these long multiplies take just 1 cycle to execute.

Summary of Multiply Cycles

To summarize the multiply cycle counts on the Cortex-M0:

  • MUL takes 1 cycle for 32-bit x 32-bit multiply
  • MLA takes 1 cycle for multiply-accumulate
  • MLS takes 1 cycle for multiply-subtract
  • SMULL takes 1 cycle for signed 64-bit multiply
  • UMULL takes 1 cycle for unsigned 64-bit multiply

The Cortex-M0 integer multiplier is highly optimized to deliver single-cycle throughput for all multiply and multiply-accumulate operations. This makes it well suited for digital signal processing and other math intensive applications.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article ARM Cortex M0+ Integer Division
Next Article What is the maximum frequency of cortex-M0?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

How to Boot Cortex-M3 STM32F1 from RAM?

Booting Cortex-M3 STM32F1 microcontrollers from RAM instead of flash memory…

6 Min Read

How to get QEMU to run an ARM Thumb binary?

Getting QEMU to run an ARM Thumb binary requires configuring…

8 Min Read

Integrating AMBA Bus with Cortex-M1 in FPGA Designs

Integrating the AMBA (Advanced Microcontroller Bus Architecture) bus with a…

10 Min Read

What is the exception in the Cortex-M?

The Cortex-M is a family of ARM processor cores designed…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account