What is the difference between ARM MVE and neon?

ARM-based processors have long included SIMD instructions to improve performance for multimedia and signal processing workloads. Two key SIMD instruction sets used in ARM processors are NEON and MVE (Matrix Vector Extension). While both provide SIMD capabilities, there are some key differences between the two.

Contents

Overview of NEON Overview of MVE Key Differences Between NEON and MVE Target Workloads Supported Data Types Vector Length Hardware Implementation Target Processors Instruction Set Matrix Operations MVE 2 Use Cases NEON Use Cases MVE Use Cases Programming and Compiler Support Performance Comparison Conclusion

Overview of NEON

NEON is a SIMD instruction set that has been included in ARM Cortex-A series processors since the Cortex-A8 in 2006. It operates on 64-bit and 128-bit vectors, allowing operations on multiple data elements concurrently. NEON supports common data types like integers, floating point numbers, and polynomials.

Some key capabilities and features of NEON include:

128-bit wide SIMD with single instruction multiple data (SIMD) processing
Supports 8, 16, 32 and 64 bit integer and single-precision (32-bit) floating-point data types
Specialized instructions for audio and video processing, 3D graphics, speech recognition, and image processing

Fused multiply-add instructions for better performance and precision
Saturating arithmetic allowing overflow values to be clamped to max/min values

NEON is implemented as a coprocessor and has 32 64-bit registers that can be viewed as 16 128-bit registers known as quadword registers. NEON instructions operate on these quadword vectors stored in these registers.

Overview of MVE

The MVE (Matrix Vector Extension) is a more recent SIMD instruction set introduced in the ARMv8-M architecture for microcontrollers. It is supported in Cortex-M33 and newer Cortex-M cores.

MVE focuses on enhancing performance for machine learning workloads on microcontrollers, providing specialized instructions for common vector and matrix operations used in ML. Some key features of MVE include:

Instructions for 8-bit and 16-bit integer matrix operations

Supports linear algebra primitives like dot product
Vector by scalar operations to multiply vectors by a scalar value
Saturating arithmetic like NEON

Permute instructions to rearrange vector elements
Data reordering load/store instructions to optimize memory access patterns

MVE introduces a new 128-bit wide vector register file known as the Advanced SIMD and Floating-point Extension register file. This provides 32 128-bit registers that MVE instructions can operate on.

Key Differences Between NEON and MVE

While both NEON and MVE provide SIMD capabilities to ARM processors, there are some notable differences between the two architectures:

Target Workloads

NEON is designed as a general purpose SIMD engine for accelerating a wide range of media, signal processing, and computational workloads. MVE is more specialized, focused on accelerating machine learning workloads on microcontrollers.

Supported Data Types

NEON supports a wider range of integer and floating point data types including 8, 16, 32, and 64-bit integers and 32-bit single precision floats. MVE focuses on lower precision data types like 8-bit and 16-bit integers which are commonly used in machine learning models.

Vector Length

NEON uses 128-bit vector registers and instructions that can operate on different vector lengths within that 128-bit register. MVE uses fixed length 128-bit vectors.

Hardware Implementation

NEON relies on specialized 64-bit and 128-bit vector registers and execution units. MVE reuses existing 32-bit registers and hardware by performing 128-bit MVE operations over four cycles.

Target Processors

NEON is designed for high performance application processors like the Cortex-A series. MVE targets lower power microcontrollers like Cortex-M series chips.

Instruction Set

While both support common SIMD instructions, NEON has a much larger and richer set of instructions optimized for media processing. MVE instructions are more focused on machine learning primitives.

Matrix Operations

MVE provides direct support for matrix operations through its matrix load/store and matrix multiply instructions. NEON relies on using general SIMD instructions to implement matrix operations.

MVE 2

An enhanced version of MVE, called MVE2, has also been introduced. MVE2 adds additional capabilities:

Increased maximum vector length of 256-bits
32 256-bit vector registers
Native support for intrinsically safer C and C++ code

Additional machine learning, vision, and sensor fusion primitives

Use Cases

Given their different strengths, NEON and MVE tend to be used in different domains:

NEON Use Cases

Image, audio and video processing

Speech recognition
Computer vision
Scientific computing

3D graphics
Gaming
High performance computing

MVE Use Cases

TinyML applications like keyword spotting
Anomaly detection
Predictive maintenance

Industrial IoT
Autonomous robots
Smart home devices

Programming and Compiler Support

Both NEON and MVE are supported by ARM’s compilers like armc and the Arm Compiler 6 toolchain. They provide auto-vectorization capabilities to automatically vectorize code using NEON/MVE as well as intrinsics to allow explicit SIMD programming.

For NEON, additional support is provided by:

GCC’s ARM NEON intrinsics

Clang’s NEON vector types and intrinsics
C++ SIMD libraries like SIMD++

For MVE, the Arm C Language Extension (ACLE) provides C intrinsics that map to MVE instructions. ACLE is supported by Arm Compiler 6 and the GNU Arm Embedded Toolchain.

Performance Comparison

Some key performance differences between NEON and MVE include:

NEON delivers higher peak compute performance given its 128-bit vectors and pipelines designed for high throughput.
However, MVE provides better energy efficiency and performance per watt suited for power constrained devices.

MVE reduces memory bandwidth requirements compared to NEON thanks to its matrix load/store and data reordering instructions.
For machine learning workloads, MVE can provide up to a 4X performance increase over standalone Cortex-M cores.

So while NEON has higher absolute performance, MVE is optimized to accelerate machine learning workloads on microcontrollers efficiently.

Conclusion

In summary, the key differences between NEON and MVE are:

NEON is a general purpose SIMD engine while MVE is optimized for ML workloads.
NEON supports a wider range of data types while MVE focuses on low precision integers.

NEON has a much larger instruction set while MVE instructions target ML primitives.
NEON is designed for application processors while MVE targets microcontrollers.

So NEON and MVE complement each other, with NEON handling high performance media workloads and MVE accelerating machine learning on embedded devices. Both continue to evolve with enhancements like MVE2 to drive improved performance and efficiency across a diverse range of ARM-based systems.