The ARM Cortex-M is a group of 32-bit RISC ARM processor cores licensed by Arm Holdings. The Cortex-M cores are designed for low cost, low power, and high performance microcontrollers used in embedded applications. The Cortex-M architecture is simpler than the more complex Cortex-A series designed for application processors in smartphones, tablets and other mobile devices.
History of ARM Cortex-M
The first Cortex-M0 processor core was announced by ARM in 2004. It was designed specifically for microcontroller applications requiring low cost, low power, and high performance. The M0 had only 12,000 gates, much simpler than previous ARM cores.
In 2007, ARM announced the Cortex-M3 processor which added a memory protection unit (MPU), optional floating point unit (FPU), and improved debug features. The M3 was widely adopted in automotive, industrial, and consumer applications.
The next major release was the Cortex-M4 in 2010 which included single precision floating point for improved digital signal processing capabilities. The M4 also added optional digital signal processing (DSP) instructions, low-latency interrupts, and optional memory protection.
In 2012, ARM announced the Cortex-M0+ which was an upgrade over the M0 with improved energy efficiency through clock gating and other techniques. The M0+ is today the most energy efficient ARM processor available.
The latest core is the Cortex-M7 released in 2015 for more advanced signal processing capabilities. It includes optional double precision floating point, optional ARMv8 cryptographic instructions, and optional tightly coupled memory.
Cortex-M Architecture Overview
The Cortex-M processor architecture is designed specifically for deeply embedded, low power applications. The key aspects of the architecture include:
- 32-bit RISC instruction set optimized for embedded applications
- Efficient 3-stage pipeline to achieve high performance at low clock speeds
- Optional memory protection unit (MPU) for safety critical applications
- Wakeup interrupt controller (WIC) for ultra low power sleep modes
- Digital signal processing instructions on Cortex-M4 and M7 cores
- Single precision floating point unit (FPU) on Cortex-M4 and M7
- Double precision FPU optional on Cortex-M7
- Tightly Coupled Memory (TCM) interface for low latency access
- Nested Vectored Interrupt Controller (NVIC) with low latency interrupts
This combination of features makes the Cortex-M well suited for a wide range of deeply embedded, real-time applications including industrial, automotive, consumer, medical devices, wearables, and IoT edge nodes.
Instruction Set Architecture
The Cortex-M processors use the Thumb-2 instruction set which is a highly efficient 32-bit RISC architecture defined by ARM. Thumb-2 provides both 16-bit and 32-bit instructions to improve code density without compromising performance.
Key features of the Thumb-2 instruction set architecture include:
- Highly efficient RISC design
- 16-bit and 32-bit instruction support
- Uniform 32-bit instruction length
- Load/store architecture with support for C programming
- Up to 32 general purpose registers
- Conditional execution support
- Intrinsic support for SIMD operations
- Low power capabilities through event wakeups
For Cortex-M4 and M7 cores, additional digital signal processing (DSP) instructions are added. This includes saturation arithmetic, SIMD arithmetic, and multiply with accumulate (MAC) type operations commonly used in digital signal processing algorithms.
Memory Architecture
The Cortex-M processors support Harvard architecture with separate instruction and data buses for higher performance. This allows instruction fetches to occur in parallel with data accesses.
The memory architecture consists of:
- Flash memory up to 4GB for instructions
- SRAM up to 4GB for data
- Optional Tightly Coupled Memory (TCM) for low latency access
- Bit banding region for bit level access
- Optional Memory Protection Unit (MPU) on Cortex-M3, M4, M7
The MPU provides support for creating isolated memory regions for safety critical applications. This allows protecting privileged code and data from unprivileged access.
Bit banding allocates each bit in a word its own addressable memory location. This allows bit level read-modify-write access to be performed without read-modify-write instructions.
Processor Modes
The Cortex-M processors support two modes of operation:
- Thread Mode – Unprivileged mode for application code
- Handler Mode – Privileged mode for exception handlers
Thread mode is used to execute normal application code. Handler mode is entered on exceptions and interrupts to execute handler code. Handler mode has access to certain privileged operations not available in thread mode.
On reset, the processor starts in handler mode to allow privileged configuration of the system. Then it switches to thread mode before starting the application. The Memory Protection Unit controls the access between the modes.
Exceptions and Interrupts
The Cortex-M processors support advanced exception and interrupt handling capabilities. This includes:
- Configurable priority levels for IRQ/exception handling
- Low latency exception/IRQ handling
- Wakeup Interrupt Controller (WIC) for waking up the core
- Nested Vectored Interrupt Controller (NVIC)
The WIC monitors events and wake up sources to wake up the processor from sleep/low power modes. This minimizes latency for handling wakeup events.
The NVIC manages exception and interrupt handling. It supports configurable priority levels and nested interrupts for real-time applications. Low latency interrupt handling is supported through tail-chaining.
Power Management
The Cortex-M processors provide extensive power management capabilities including:
- Multiple low power sleep modes
- Automatic power gating of unused modules
- Wake up interrupt controller
- Wait for interrupt/event instruction
- Wakeup timer for periodic events
- Clock gating of unused modules
Sleep modes support automatic power gating of unused modules and peripherals. The WIC monitors for wakeup events to transition from sleep to active mode with low latency.
In active mode, automatic clock gating shuts down the clocks to unused modules and peripherals. The wait for interrupt instruction can also put the processor to sleep till the next interrupt occurs.
Debug Architecture
The Cortex-M processors contain an advanced debug architecture designed specifically for embedded applications. This includes:
- ARM CoreSight debug infrastructure
- External debug interface
- Embedded Trace Macrocell (ETM) for instruction trace
- Instrumentation Trace Macrocell (ITM) for printf debugging
- Data watchpoint and trace unit (DWT)
- Flash patch and breakpoint unit
- Debug access port (DAP)
The debug architecture provides interfaces for on-chip debugging and tracing. The ETM and ITM modules allow instruction and data tracing for advanced debugging. The DWT module supports complex data watchpoints.
The DAP provides the interface and protocols for on-chip debug probes to access the debug components. This enables advanced debugging using JTAG/SWD interfaces.
Coprocessing Interface
The Cortex-M4 and M7 provide an optional coprocessing interface to connect specialized coprocessors. This allows offloading certain tasks from the main CPU to dedicated coprocessors.
Typical applications include:
- Cryptographic accelerators
- Hardware security modules
- DSP accelerators
- Image processing units
- Machine learning accelerators
The coprocessing interface provides a simple handshake mechanism for the CPU to offload work to the coprocessor. This improves performance and efficiency for specialized tasks.
Cortex-M Implementations
The Cortex-M cores are licensed to various semiconductor companies that design and manufacture the full processor chip. Common implementations include:
- Microcontrollers – Self contained MCU chips like STM32, NXP Kinetis, Atmel SAM, etc.
- SoCs – Cortex-M as a subsystem in System-on-Chip devices
- ASICs – Custom silicon chips integrating Cortex-M cores
- FPGAs – Soft cores implemented in programmable logic
There are thousands of silicon chip options implementing the various Cortex-M cores. The most popular series are the STMicroelectronics STM32 ARM Cortex-M based microcontrollers due to their low cost, excellent performance and large ecosystem.
Conclusion
In summary, the ARM Cortex-M architecture is designed for deeply embedded, low power microcontroller applications. The combination of high performance 32-bit RISC core, advanced peripherals, low power operation, and excellent development tools has made Cortex-M one of the most popular processor architectures for the embedded market.