SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is ARM Cortex-M55?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is ARM Cortex-M55?

Elijah Erickson
Last updated: September 7, 2023 12:21 pm
Elijah Erickson 9 Min Read
Share
SHARE

The ARM Cortex-M55 is the latest and most advanced processor in ARM’s Cortex-M series of embedded, IoT and MCU-focused processor cores. The Cortex-M55 builds upon the previous generation Cortex-M33 processor and brings new capabilities and performance specifically aimed at AI and ML workloads in embedded and edge devices.

Contents
Overview and Target ApplicationsKey FeaturesMicroarchitectureHelium TechnologyDSP and Floating PointPerformanceDevelopment Tools and SoftwareLicensing and Availability

Overview and Target Applications

The Cortex-M55 is designed for use in AI-enabled embedded and IoT applications where low power and high efficiency are critical. This includes areas such as:

  • Industrial automation and robotics
  • Automotive advanced driver assistance systems (ADAS) and autonomous vehicles
  • Smart homes/buildings/cities
  • Wearables and hearables
  • Retail analytics and surveillance

The Cortex-M55 aims to bring new levels of machine learning capability to resource constrained edge devices, enabling more responsive and intelligent behavior without having to rely solely on the cloud. Its specialized microarchitecture is optimized to deliver up to 5x better performance per MHz on ML workloads compared to the previous Cortex-M33 processor.

Key Features

Some of the key features and capabilities of the ARM Cortex-M55 processor include:

  • Helium Vector Extension (HVX) – A new 128-bit SIMD instruction set extension designed specifically for heavy parallel workloads like ML/AI. It delivers significant gains on vectorized math operations.
  • DSP Extension – Enhancements to the digital signal processing (DSP) instruction set for improved scalar math performance.
  • M55 Memory System – Optimized system architecture with tightly coupled memory (TCM) to maximize data throughput for ML workloads.
  • Enhanced MPU – Added memory protection unit (MPU) capabilities for improved software isolation and security.
  • TrustZone-M – ARM’s hardware-based security solution for Cortex-M devices is enhanced with even more features.
  • Floating Point Unit – Supports single and double precision floating point calculations.
  • DSP+FP Architectural Pairing – Allows floating point and DSP instructions to be issued simultaneously for improved scalar math performance.
  • Wake-up Interrupt Controller – Reduces latency and power consumption when entering active mode.
  • System Error Correction Codes – Detects and corrects single bit errors in memories and bus transactions.
  • Enhanced Debug – Updates to embedded trace macrocell and micro trace buffer for more effective debugging.

Microarchitecture

The Cortex-M55 implements a dual-issue superscalar pipeline alongside the vector processing capabilities. This enables simultaneous issuing of certain instructions types, including:

  • Issuing an HVX instruction with a scalar ALU instruction
  • Issuing a DSP multiply with a scalar ALU operation
  • Issuing a scalar ALU op with a scalar ALU op
  • Issuing a scalar ALU with a load/store
  • Issuing a DSP multiply with a load/store

The microarchitecture incorporates branch prediction and prefetching techniques to optimize instruction throughput. 2-way instruction cache helps ensure steady code execution, while 2-way data cache enables fast data access.

The M55 can dynamically adapt between high performance modes and lightweight modes optimized for low power depending on workload. Multiple low power states are available to gate clocks and cut power to unused sections of the chip.

Helium Technology

The headline feature of the Cortex-M55 is the new Helium vector processing technology. The key components of Helium include:

  • Vector ALUs – SIMD execution units that can perform mathematical vector operations on up to 128 bits per cycle.
  • Vector Register File – Holds vector operands and results during processing.
  • Vector Memory Load/Store Units – Transfers vector data between main memory and the registers.
  • Permutation Unit – Allows re-ordering of vector data elements for flexibility.
  • Reduction Unit – Accumulates partial vector results.

This vector architecture is designed to accelerate ML workloads by enabling more parallel execution on the types of math found in neural networks and signal processing algorithms.

Helium supports 8, 16 and 32-bit integer formats as well as 16-bit floating point format for vectors. Special widening instructions allow smaller integer types to be efficiently packed into larger vectors.

The Helium extension provides a comprehensive set of instructions for ML acceleration, including:

  • Vector arithmetic (add, subtract, multiply, shift, compare, etc.)
  • Vector load and store (aligned/unaligned, with optional post-increment)
  • Vector reduction (sum, minimum, maximum, etc.)
  • Vector shuffling/permutation
  • Vector comparison and thresholding
  • Vector multiplication with scalar
  • Vector widening and narrowing

DSP and Floating Point

Alongside Helium, the Cortex-M55 maintains and improves ARM’s DSP and floating point capabilities for Cortex-M class processors. This allows non-vector math to also benefit from greater parallelism and throughput.

The DSP extension provides single-cycle 16×16 and 32×32 bit multiplies with 32-bit and 64-bit accumulators respectively. Proven ARMv7-M Thumb DSP instructions are used along with enhancements added in the Cortex-M33.

The floating point unit (FPU) has been upgraded to allow simultaneous issue and execution of scalar DSP and floating point instructions – a unique feature called DSP+FP architectural pairing. This boosts performance for algorithms using both types of math.

The FPU supports both single precision (32-bit) and double precision (64-bit) operations. Advanced SIMD instructions are also supported for vector floating point on the FPU.

Performance

ARM claims the Cortex-M55 delivers up to 15x better AI performance than previous Cortex-M class processors like the Cortex-M33 and M4. Exact gains will depend on workload, but on key ML benchmark tests it has shown:

  • 5-15x higher recurrent neural network performance
  • 10-15x faster large convolutional neural networks
  • 6-8x faster small convolutional neural networks
  • 5-20x better deep neural network performance

The dual issue pipeline enables up to 30% better scalar processing performance compared to the Cortex-M33. The M55 also benefits from ARM’s most energy efficient processor design, delivering the highest performance per MHz per mW.

Overall, the advances in the Cortex-M55 promise to enable more localized ML inferencing directly on low power embedded devices rather than relying on the cloud.

Development Tools and Software

To support developers working with the Cortex-M55, ARM offers an enhanced CMSIS-NN software library for neural network workloads. This provides over 100 kernel functions to maximize Helium utilization.

The ARM Compute Library is also available with additional functions to accelerate ML on Cortex-M processors using both Helium and ARM NEON SIMD instructions.

Development tools include compiler support in ARM Compiler 6, Keil MDK toolkit and IAR Embedded Workbench. Debug and trace capabilities are enabled through ARM CoreSight debug and trace IP.

To simplify software development across the Cortex-M series, code written for previous generations like Cortex-M33 and M4 will work on M55 without modification. This helps accelerate migration to the new architecture.

Licensing and Availability

The Cortex-M55 processor is available for licensing now directly from ARM. Lead partners and early access customers include NXP Semiconductors, STMicroelectronics and Silicon Labs.

NXP plans to use the M55 in a range of automotive, industrial and IoT applications. STMicroelectronics will combine Helium with their AI accelerator hardware for smart embedded systems. Silicon Labs is developing solutions for battery-powered IoT endpoints.

Expect ARM Cortex-M55 processor IP to start appearing in commercial chips and products over the next year or so as new edge AI capabilities get deployed across a diverse range of markets.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article What is ARM Cortex-M35P?
Next Article What is ARM Cortex-M85?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

C Programming for Microcontrollers

Microcontrollers are small, low-power computers that are used to control…

7 Min Read

What is the HFSR register on the arm?

The HFSR (HardFault Status Register) is one of the key…

13 Min Read

Arm Cortex M4 Errata

The Arm Cortex-M4 processor is a popular 32-bit microcontroller core…

6 Min Read

How many cycles does an ARM Cortex M0 use to multiply floats?

The ARM Cortex-M0 is one of the most popular microcontroller…

8 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account