What is ARM Cortex-M55?

The ARM Cortex-M55 is the latest and most advanced processor in ARM’s Cortex-M series of embedded, IoT and MCU-focused processor cores. The Cortex-M55 builds upon the previous generation Cortex-M33 processor and brings new capabilities and performance specifically aimed at AI and ML workloads in embedded and edge devices.

Contents

Overview and Target Applications Key Features Microarchitecture Helium Technology DSP and Floating Point Performance Development Tools and Software Licensing and Availability

Overview and Target Applications

The Cortex-M55 is designed for use in AI-enabled embedded and IoT applications where low power and high efficiency are critical. This includes areas such as:

Industrial automation and robotics

Automotive advanced driver assistance systems (ADAS) and autonomous vehicles
Smart homes/buildings/cities
Wearables and hearables

Retail analytics and surveillance

The Cortex-M55 aims to bring new levels of machine learning capability to resource constrained edge devices, enabling more responsive and intelligent behavior without having to rely solely on the cloud. Its specialized microarchitecture is optimized to deliver up to 5x better performance per MHz on ML workloads compared to the previous Cortex-M33 processor.

Key Features

Some of the key features and capabilities of the ARM Cortex-M55 processor include:

Helium Vector Extension (HVX) – A new 128-bit SIMD instruction set extension designed specifically for heavy parallel workloads like ML/AI. It delivers significant gains on vectorized math operations.
DSP Extension – Enhancements to the digital signal processing (DSP) instruction set for improved scalar math performance.
M55 Memory System – Optimized system architecture with tightly coupled memory (TCM) to maximize data throughput for ML workloads.

Enhanced MPU – Added memory protection unit (MPU) capabilities for improved software isolation and security.
TrustZone-M – ARM’s hardware-based security solution for Cortex-M devices is enhanced with even more features.
Floating Point Unit – Supports single and double precision floating point calculations.

DSP+FP Architectural Pairing – Allows floating point and DSP instructions to be issued simultaneously for improved scalar math performance.
Wake-up Interrupt Controller – Reduces latency and power consumption when entering active mode.
System Error Correction Codes – Detects and corrects single bit errors in memories and bus transactions.

Enhanced Debug – Updates to embedded trace macrocell and micro trace buffer for more effective debugging.

Microarchitecture

The Cortex-M55 implements a dual-issue superscalar pipeline alongside the vector processing capabilities. This enables simultaneous issuing of certain instructions types, including:

Issuing an HVX instruction with a scalar ALU instruction

Issuing a DSP multiply with a scalar ALU operation
Issuing a scalar ALU op with a scalar ALU op
Issuing a scalar ALU with a load/store

Issuing a DSP multiply with a load/store

The microarchitecture incorporates branch prediction and prefetching techniques to optimize instruction throughput. 2-way instruction cache helps ensure steady code execution, while 2-way data cache enables fast data access.

The M55 can dynamically adapt between high performance modes and lightweight modes optimized for low power depending on workload. Multiple low power states are available to gate clocks and cut power to unused sections of the chip.

Helium Technology

The headline feature of the Cortex-M55 is the new Helium vector processing technology. The key components of Helium include:

Vector ALUs – SIMD execution units that can perform mathematical vector operations on up to 128 bits per cycle.
Vector Register File – Holds vector operands and results during processing.

Vector Memory Load/Store Units – Transfers vector data between main memory and the registers.
Permutation Unit – Allows re-ordering of vector data elements for flexibility.
Reduction Unit – Accumulates partial vector results.

This vector architecture is designed to accelerate ML workloads by enabling more parallel execution on the types of math found in neural networks and signal processing algorithms.

Helium supports 8, 16 and 32-bit integer formats as well as 16-bit floating point format for vectors. Special widening instructions allow smaller integer types to be efficiently packed into larger vectors.

The Helium extension provides a comprehensive set of instructions for ML acceleration, including:

Vector arithmetic (add, subtract, multiply, shift, compare, etc.)
Vector load and store (aligned/unaligned, with optional post-increment)
Vector reduction (sum, minimum, maximum, etc.)

Vector shuffling/permutation
Vector comparison and thresholding
Vector multiplication with scalar

Vector widening and narrowing

DSP and Floating Point

Alongside Helium, the Cortex-M55 maintains and improves ARM’s DSP and floating point capabilities for Cortex-M class processors. This allows non-vector math to also benefit from greater parallelism and throughput.

The DSP extension provides single-cycle 16×16 and 32×32 bit multiplies with 32-bit and 64-bit accumulators respectively. Proven ARMv7-M Thumb DSP instructions are used along with enhancements added in the Cortex-M33.

The floating point unit (FPU) has been upgraded to allow simultaneous issue and execution of scalar DSP and floating point instructions – a unique feature called DSP+FP architectural pairing. This boosts performance for algorithms using both types of math.

The FPU supports both single precision (32-bit) and double precision (64-bit) operations. Advanced SIMD instructions are also supported for vector floating point on the FPU.

Performance

ARM claims the Cortex-M55 delivers up to 15x better AI performance than previous Cortex-M class processors like the Cortex-M33 and M4. Exact gains will depend on workload, but on key ML benchmark tests it has shown:

5-15x higher recurrent neural network performance
10-15x faster large convolutional neural networks
6-8x faster small convolutional neural networks

5-20x better deep neural network performance

The dual issue pipeline enables up to 30% better scalar processing performance compared to the Cortex-M33. The M55 also benefits from ARM’s most energy efficient processor design, delivering the highest performance per MHz per mW.

Overall, the advances in the Cortex-M55 promise to enable more localized ML inferencing directly on low power embedded devices rather than relying on the cloud.

Development Tools and Software

To support developers working with the Cortex-M55, ARM offers an enhanced CMSIS-NN software library for neural network workloads. This provides over 100 kernel functions to maximize Helium utilization.

The ARM Compute Library is also available with additional functions to accelerate ML on Cortex-M processors using both Helium and ARM NEON SIMD instructions.

Development tools include compiler support in ARM Compiler 6, Keil MDK toolkit and IAR Embedded Workbench. Debug and trace capabilities are enabled through ARM CoreSight debug and trace IP.

To simplify software development across the Cortex-M series, code written for previous generations like Cortex-M33 and M4 will work on M55 without modification. This helps accelerate migration to the new architecture.

Licensing and Availability

The Cortex-M55 processor is available for licensing now directly from ARM. Lead partners and early access customers include NXP Semiconductors, STMicroelectronics and Silicon Labs.

NXP plans to use the M55 in a range of automotive, industrial and IoT applications. STMicroelectronics will combine Helium with their AI accelerator hardware for smart embedded systems. Silicon Labs is developing solutions for battery-powered IoT endpoints.

Expect ARM Cortex-M55 processor IP to start appearing in commercial chips and products over the next year or so as new edge AI capabilities get deployed across a diverse range of markets.

What is ARM Cortex-M55?

Overview and Target Applications

Key Features

Microarchitecture

Helium Technology

DSP and Floating Point

Performance

Development Tools and Software

Licensing and Availability

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

ARM Cortex M0 Programming in C

Common Errors in Cortex-M1 Vector Table Setup

Implementing Floating Point Math on Cortex-M3

Debugging capabilities of Cortex-A76 with CoreSight