The short answer is no, ARM’s Neon SIMD instruction set extension is not available on Cortex-M series processors. Neon is only supported on certain Cortex-A series application processors aimed at higher performance requirements.
Introduction to ARM’s Neon Technology
Neon is ARM’s single instruction multiple data (SIMD) architecture extension for the ARMv7 architecture and newer ARM processor cores. It provides SIMD processing capabilities to Cortex-A series application processors, enabling improved performance for multimedia, signal processing, and other computationally intensive workloads.
Neon supports 64-bit and 128-bit SIMD vector processing, allowing operations to be performed on multiple data elements concurrently using a single instruction. This can significantly boost performance for workloads that exhibit data parallelism.
Some of the key features of Neon include:
- 128-bit SIMD vector processing
- Support for 8, 16, 32 and 64-bit integer and single-precision floating point data types
- Saturated arithmetic and rounding operations
- Advanced SIMD load/store instructions for aligned and unaligned access
- Matrix multiplication operations
- 2D convolution acceleration
- Cryptographic acceleration functions
Neon is implemented as an optional extension in Cortex-A series processors like Cortex-A8, Cortex-A9, Cortex-A15, Cortex-A53 etc. The inclusion of Neon is optional and is determined by the chip designer based on the intended application domain and performance requirements.
Cortex-M Series and Neon
The Cortex-M series of processors from ARM are efficient low power microcontroller cores designed for embedded and IoT applications. They prioritize power efficiency, deterministic real-time performance, and minimized silicon area over raw processing performance.
Unlike application processors, Cortex-M series cores are in-order execution pipelines without advanced microarchitectural features like superscalar execution, out-of-order execution, branch prediction etc. They also have relatively simple memory subsystem designs compared to high performance application processors.
As a result, Cortex-M series processors do not support Neon or any other SIMD instruction set extensions. The key reasons are:
- In-order pipelines cannot take advantage of instruction level parallelism provided by SIMD
- Lack of advanced microarchitectural features limits performance scalability
- Embedded microcontroller applications often do not need high math performance
- Neon increases core complexity, silicon area and power consumption
- Software complexity from new instruction set architecture
Enabling Neon requires significant microarchitectural changes and optimizations that go against the design goals of simplicity, efficiency and real-time determinism for Cortex-M series. The power and area overhead is difficult to justify given most microcontroller applications do not need SIMD acceleration.
For the rare cases where higher math performance may be needed, Cortex-M can offload processing to dedicated math accelerators and DSPs optimized for signal processing workloads.
Role of Cortex-M and Cortex-A Processors
The Cortex-M and Cortex-A series have very different design goals and target applications. This leads to different architectural trade-offs regarding performance, power and cost:
- Cortex-M – Microcontrollers for real-time applications like motor control, industrial automation, IoT sensors etc. Focused on power efficiency, determinism, minimal area.
- Cortex-A – Application processors for devices like smartphones, tablets, computers. Optimized for high performance and advanced capabilities like computer vision, multimedia, gaming etc.
While Cortex-M forgoes power-hungry capabilities like Neon that are not needed for embedded use cases, Cortex-A application processors include these to address performance-critical application domains.
Neon provides a major performance boost for workloads like image processing, video encoding/decoding, speech recognition, physics simulations, machine learning inferencing etc. These workloads involve large amounts of vector and matrix data parallelism that Neon can efficiently accelerate.
For example, Neon can speed up convolution layers in neural networks by processing multiple input and filter values concurrently. This results in significantly faster deep learning inferencing compared to scalar execution.
By including Neon, Cortex-A series processors like Cortex-A73, Cortex-A76 and Cortex-A77 provide the computational horsepower needed for complex workloads in mobile, desktop and server computing. The power and area trade-offs are acceptable given application performance requirements.
Final Thoughts
In summary, Neon SIMD acceleration is not suitable for the design goals and embedded target applications of Cortex-M class microcontroller cores. The power and complexity overheads cannot be justified.
Neon provides major performance benefits for high performance application processors like Cortex-A series that need to handle advanced workloads like AI inferencing, 3D graphics, image processing etc. The overhead is acceptable given their higher performance requirements.
The division of ARM CPU cores into the efficiency-oriented Cortex-M series and higher performance Cortex-A series allows optimal architectural trade-offs for vastly different use cases from microcontrollers to servers.
So in most cases, it makes sense to pair a Cortex-M microcontroller for real-time control tasks with a Cortex-A application processor for number crunching workloads in an end application system. The strengths of both core types can then be leveraged via coordination over a software interface.