SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: Cortex-M7 DSP Instructions
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

Cortex-M7 DSP Instructions

David Moore
Last updated: September 12, 2023 3:17 am
David Moore 7 Min Read
Share
SHARE

The Cortex-M7 processor from ARM introduces DSP instructions to boost digital signal processing performance. These instructions allow common DSP operations like FFTs and filters to be executed more efficiently. The key benefits of the Cortex-M7 DSP instructions are:

Contents
DSP Instruction CategoriesMultiplicationSaturation ArithmeticLogical OperationsPacking and UnpackingSIMDDSP Extension InstructionsDSP Instruction Latency and ThroughputCoding Efficient DSP AlgorithmsDSP Optimization with IntrinsicsDSP BenchmarkingConclusion
  • Improved performance for DSP algorithms – DSP instructions execute in a single cycle allowing more operations per second.
  • Reduced code size – DSP operations require fewer instructions compared to doing the same function without DSP instructions.
  • Power efficiency – By reducing the number of instructions needed, DSP instructions require less power.

DSP Instruction Categories

The Cortex-M7 DSP instructions can be grouped into several categories:

Multiplication

These include single-cycle 16×16 bit multiplications with 32-bit results. This allows faster execution of multiply-accumulate (MAC) operations commonly used in DSP. Some instructions include:

  • SMULBB – Signed multiply of two 8-bit values
  • SMULBT – Signed multiply of one 8-bit and one 16-bit value
  • SMLABB – Signed multiply-accumulate of two 8-bit values
  • SMLABT – Signed multiply-accumulate of one 8-bit and one 16-bit value

Saturation Arithmetic

Saturation arithmetic limits results to a defined range and is useful for avoiding overflow in DSP algorithms. Instructions include:

  • SSAT – Signed saturate
  • USAT – Unsigned saturate
  • QADD – Saturating addition
  • QDADD – Saturating double addition

Logical Operations

These perform bitwise logical operations with saturation. For example:

  • SSAX – Signed saturating add & extract
  • USAX – Unsigned saturating add & extract
  • USAD8 – Unsigned sum of absolute differences

Packing and Unpacking

Packing condenses data into smaller bit widths. Unpacking does the reverse. This assists with optimized data storage and transfers. Instructions include:

  • PKHBT – Pack halfword (16 bits) to byte (8 bits)
  • SXTB – Sign extend byte to halfword
  • SXTH – Sign extend halfword to word
  • UXTB – Zero extend byte to halfword

SIMD

SIMD (single instruction, multiple data) performs the same operation on multiple values at once. For example:

  • SADD8 – Add 8-bit values from two registers
  • SADD16 – Add 16-bit values from two registers
  • SEL – Select bytes from two registers

DSP Extension Instructions

In addition to the base DSP instructions, the Cortex-M7 includes optional DSP extension instructions for added performance boosts. These include:

  • Dot product – Efficient vector dot product calculation
  • Multiply with accumulate – Combined multiply and accumulate
  • Multiply with subtract – Combined multiply and subtract
  • Min/max – Get min or max of two values with a single instruction
  • Bitfield – Extract and insert bitfields
  • Bit counting – Population count and parity

DSP Instruction Latency and Throughput

Understanding the latency and throughput of instructions is key to maximizing DSP performance. Important notes:

  • Most DSP instructions have 1 cycle latency allowing back-to-back operations.
  • Pipelining enables 1 instruction per cycle throughput with no stalls.
  • The Cortex-M7 has dual-issue capability to execute many instructions simultaneously.
  • Certain instructions have multi-cycle latency and affect throughput if used incorrectly.

By scheduling instructions appropriately and maximizing parallel execution, the highest throughput can be achieved.

Coding Efficient DSP Algorithms

Here are some tips for coding algorithms to take advantage of the Cortex-M7 DSP instructions:

  • Use single-cycle 16×16 bit multiplies instead of 32-bit for better performance.
  • Minimize data movement – process data in-place where possible.
  • Maximize dual-issued instructions by interleaving independent instructions.
  • Unroll small loops to reduce overhead and maximize parallelism.
  • Use SIMD instructions to exploit data level parallelism.
  • Consider using DSP extension instructions like dot product.
  • Use saturation arithmetic instead of branches to avoid stalls.

DSP Optimization with Intrinsics

Compiler intrinsics provide access to DSP instructions without needing to write assembly code. For example: float32_t sum; float32_t *inp, *coeff; // Use intrinsic for multiply-accumulate sum = __SMLAD(*inp++, *coeff++);

Intrinsics allow the compiler to schedule instructions for optimal performance. Key advantages are:

  • Write efficient DSP code in C/C++ instead of assembly
  • Compiler handles instruction scheduling and pipelining
  • Avoid errors from hand-written assembly code
  • Code is portable between different ARM cores
  • Allows use of high-level language tools/debugging

ARM provides an extensive set of intrinsics for the Cortex-M7 DSP instructions. Using intrinsics combined with the coding techniques outlined earlier allows DSP algorithms to take full advantage of the processor’s capabilities.

DSP Benchmarking

Benchmarking is important to validate the performance gains from using DSP instructions and quantify any improvements. Some tips for effective benchmarking include:

  • Use representative DSP functions like FFTs, filtering, matrix math etc.
  • Compare with and without DSP-optimized implementations.
  • Measure both execution time and number of cycles.
  • Compare code size between versions.
  • Use optimized compiler settings throughout testing.
  • Perform statistics across many iterations for accuracy.

Measuring real-world throughput and efficiency helps choose the best optimizations for a particular application. The Cortex-M7 DSP instructions enable substantial gains but optimal coding is needed to maximize performance.

Conclusion

The Cortex-M7 DSP instructions provide significant benefits for digital signal processing performance compared to conventional microcontroller architectures. By leveraging single-cycle MAC operations, saturation arithmetic, SIMD and specialized DSP extensions, complex algorithms can be made faster and more power efficient. Combined with techniques like loop unrolling, pipelining and multi-issue execution, the DSP instructions enable Cortex-M7 to address demanding DSP applications. Intrinsic functions give access to these instructions in C/C++ without relying on assembly. Thorough benchmarking is key to validate and quantify the performance gains. With its DSP feature set, Cortex-M7 delivers outstanding DSP capabilities not previously possible in microcontroller-class devices.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Cortex-M4 DSP Instructions
Next Article Cortex M4 Interrupt Vector Table
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

ARM Cortex M0 Watchdog Timer

The ARM Cortex M0 watchdog timer is a hardware peripheral…

8 Min Read

ARM Cortex M Boot Process

The ARM Cortex M is a family of 32-bit RISC…

8 Min Read

ARM Cortex-M3 Processor Functional Description

The ARM Cortex-M3 is a 32-bit RISC processor core licensed…

7 Min Read

What are the the Application Program Status Register (APSR) in Arm Cortex-M

The Application Program Status Register (APSR) in Arm Cortex-M is…

8 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account