SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is the difference between ARM MVE and neon?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is the difference between ARM MVE and neon?

Holly Lindsey
Last updated: September 12, 2023 1:14 pm
Holly Lindsey 7 Min Read
Share
SHARE

ARM-based processors have long included SIMD instructions to improve performance for multimedia and signal processing workloads. Two key SIMD instruction sets used in ARM processors are NEON and MVE (Matrix Vector Extension). While both provide SIMD capabilities, there are some key differences between the two.

Contents
Overview of NEONOverview of MVEKey Differences Between NEON and MVETarget WorkloadsSupported Data TypesVector LengthHardware ImplementationTarget ProcessorsInstruction SetMatrix OperationsMVE 2Use CasesNEON Use CasesMVE Use CasesProgramming and Compiler SupportPerformance ComparisonConclusion

Overview of NEON

NEON is a SIMD instruction set that has been included in ARM Cortex-A series processors since the Cortex-A8 in 2006. It operates on 64-bit and 128-bit vectors, allowing operations on multiple data elements concurrently. NEON supports common data types like integers, floating point numbers, and polynomials.

Some key capabilities and features of NEON include:

  • 128-bit wide SIMD with single instruction multiple data (SIMD) processing
  • Supports 8, 16, 32 and 64 bit integer and single-precision (32-bit) floating-point data types
  • Specialized instructions for audio and video processing, 3D graphics, speech recognition, and image processing
  • Fused multiply-add instructions for better performance and precision
  • Saturating arithmetic allowing overflow values to be clamped to max/min values

NEON is implemented as a coprocessor and has 32 64-bit registers that can be viewed as 16 128-bit registers known as quadword registers. NEON instructions operate on these quadword vectors stored in these registers.

Overview of MVE

The MVE (Matrix Vector Extension) is a more recent SIMD instruction set introduced in the ARMv8-M architecture for microcontrollers. It is supported in Cortex-M33 and newer Cortex-M cores.

MVE focuses on enhancing performance for machine learning workloads on microcontrollers, providing specialized instructions for common vector and matrix operations used in ML. Some key features of MVE include:

  • Instructions for 8-bit and 16-bit integer matrix operations
  • Supports linear algebra primitives like dot product
  • Vector by scalar operations to multiply vectors by a scalar value
  • Saturating arithmetic like NEON
  • Permute instructions to rearrange vector elements
  • Data reordering load/store instructions to optimize memory access patterns

MVE introduces a new 128-bit wide vector register file known as the Advanced SIMD and Floating-point Extension register file. This provides 32 128-bit registers that MVE instructions can operate on.

Key Differences Between NEON and MVE

While both NEON and MVE provide SIMD capabilities to ARM processors, there are some notable differences between the two architectures:

Target Workloads

NEON is designed as a general purpose SIMD engine for accelerating a wide range of media, signal processing, and computational workloads. MVE is more specialized, focused on accelerating machine learning workloads on microcontrollers.

Supported Data Types

NEON supports a wider range of integer and floating point data types including 8, 16, 32, and 64-bit integers and 32-bit single precision floats. MVE focuses on lower precision data types like 8-bit and 16-bit integers which are commonly used in machine learning models.

Vector Length

NEON uses 128-bit vector registers and instructions that can operate on different vector lengths within that 128-bit register. MVE uses fixed length 128-bit vectors.

Hardware Implementation

NEON relies on specialized 64-bit and 128-bit vector registers and execution units. MVE reuses existing 32-bit registers and hardware by performing 128-bit MVE operations over four cycles.

Target Processors

NEON is designed for high performance application processors like the Cortex-A series. MVE targets lower power microcontrollers like Cortex-M series chips.

Instruction Set

While both support common SIMD instructions, NEON has a much larger and richer set of instructions optimized for media processing. MVE instructions are more focused on machine learning primitives.

Matrix Operations

MVE provides direct support for matrix operations through its matrix load/store and matrix multiply instructions. NEON relies on using general SIMD instructions to implement matrix operations.

MVE 2

An enhanced version of MVE, called MVE2, has also been introduced. MVE2 adds additional capabilities:

  • Increased maximum vector length of 256-bits
  • 32 256-bit vector registers
  • Native support for intrinsically safer C and C++ code
  • Additional machine learning, vision, and sensor fusion primitives

Use Cases

Given their different strengths, NEON and MVE tend to be used in different domains:

NEON Use Cases

  • Image, audio and video processing
  • Speech recognition
  • Computer vision
  • Scientific computing
  • 3D graphics
  • Gaming
  • High performance computing

MVE Use Cases

  • TinyML applications like keyword spotting
  • Anomaly detection
  • Predictive maintenance
  • Industrial IoT
  • Autonomous robots
  • Smart home devices

Programming and Compiler Support

Both NEON and MVE are supported by ARM’s compilers like armc and the Arm Compiler 6 toolchain. They provide auto-vectorization capabilities to automatically vectorize code using NEON/MVE as well as intrinsics to allow explicit SIMD programming.

For NEON, additional support is provided by:

  • GCC’s ARM NEON intrinsics
  • Clang’s NEON vector types and intrinsics
  • C++ SIMD libraries like SIMD++

For MVE, the Arm C Language Extension (ACLE) provides C intrinsics that map to MVE instructions. ACLE is supported by Arm Compiler 6 and the GNU Arm Embedded Toolchain.

Performance Comparison

Some key performance differences between NEON and MVE include:

  • NEON delivers higher peak compute performance given its 128-bit vectors and pipelines designed for high throughput.
  • However, MVE provides better energy efficiency and performance per watt suited for power constrained devices.
  • MVE reduces memory bandwidth requirements compared to NEON thanks to its matrix load/store and data reordering instructions.
  • For machine learning workloads, MVE can provide up to a 4X performance increase over standalone Cortex-M cores.

So while NEON has higher absolute performance, MVE is optimized to accelerate machine learning workloads on microcontrollers efficiently.

Conclusion

In summary, the key differences between NEON and MVE are:

  • NEON is a general purpose SIMD engine while MVE is optimized for ML workloads.
  • NEON supports a wider range of data types while MVE focuses on low precision integers.
  • NEON has a much larger instruction set while MVE instructions target ML primitives.
  • NEON is designed for application processors while MVE targets microcontrollers.

So NEON and MVE complement each other, with NEON handling high performance media workloads and MVE accelerating machine learning on embedded devices. Both continue to evolve with enhancements like MVE2 to drive improved performance and efficiency across a diverse range of ARM-based systems.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Is Neon available with Cortex-M or Cortex-A series?
Next Article What is the ARM neon structure?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

What is ARMv8.1-M in Arm Cortex-M series?

ARMv8.1-M is the latest architecture within the ARM Cortex-M series…

8 Min Read

What are the applications of ARM processor?

ARM processors are used in a wide range of applications…

7 Min Read

Is Neon available with Cortex-M or Cortex-A series?

The short answer is no, ARM's Neon SIMD instruction set…

6 Min Read

Optimizing interrupt vectors and RTX task switching on Cortex-M1

The Cortex-M1 processor implements the ARMv6-M architecture, which provides a…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account