The Cortex-M3 is a 32-bit processor core designed by ARM to target microcontroller applications. It features a 3-stage pipeline, memory protection unit, and support for Thumb-2 instruction set which combines 16-bit and 32-bit instructions for improved code density. The Cortex-M3 implements the ARMv7-M architecture and includes features like bit-banding to allow atomic bit manipulation using word accesses. This article provides an overview of how memory is addressed and accessed on the Cortex-M3 microcontroller.
Memory Map
The Cortex-M3 has a linear address space of 4 gigabytes, ranging from 0x00000000 to 0xFFFFFFFF. This address space is divided into the following regions:
- Code region: Stores program code and constants. Located at lower addresses starting from 0x00000000.
- SRAM region: Stores data variables. Located at higher addresses.
- Peripherals: Memory mapped peripherals reside at specific addresses in the memory map.
- External memory: Additional external memories can be added in unused address regions.
The exact memory map is configurable and depends on the Cortex-M3 implementation. A typical memory layout is shown below with flash and SRAM on-chip memories:

The code region stores the executable program code and constant data. It is typically mapped to on-chip flash memory. The SRAM region stores writable data variables and stack space, and is mapped to on-chip SRAM blocks.
Memory mapped peripherals like timers, GPIO, etc occupy fixed address locations. The external memory region can be used to add additional off-chip memories.
Code and SRAM Regions
The code and SRAM regions refer to contiguous memory blocks used to store code and data respectively. These regions can use on-chip or off-chip memory blocks.
On-Chip Memories
Cortex-M3 MCUs integrate on-chip flash and SRAM blocks which are mapped to the code and SRAM regions respectively. For example:
- A 512 KB flash block can be used for storing code.
- A 96 KB SRAM block for storing data.
The on-chip memories provide zero wait state access, improving performance compared to external memories. The size of on-chip memories is limited by silicon costs.
Off-Chip Memories
External memories can be added by mapping them to unused address regions in the processor’s memory map. Common examples are:
- Code region mapped to external parallel flash.
- SRAM region mapped to external SRAM chip.
- SDRAM memory map to external DDR memory.
Off-chip memories have larger capacities but add latency due to external bus cycles. The Cortex-M3 chip hasDedicated bus interfaces like External Memory Interface to connect external memories.
Bit Banding
Bit banding is a feature that allows each individual bit in the SRAM region to be directly mapped to a word addressable location. This allows atomic read-modify-write access to individual bits using word access instructions.
To enable bit banding, a 1 MB section of the SRAM region (address 0x20000000 to 0x200FFFFF) is divided into bit band areas. Each bit in this region is mapped to a corresponding 32-bit word address location as follows:
- Bit 0 is mapped to 0x22000000
- Bit 1 to 0x22000004
- Bit 2 to 0x22000008
- …
- Bit 31 to 0x220000FC
Using this mapping, software can directly read or modify bits using word access instructions. This avoids the need for read-modify-write sequences for bit manipulation. For example, to set bit 5 at address 0x20000C00, write 0x1 to address 0x22000C14.
Memory Attributes and Access
The Cortex-M3 implements memory protection unit (MPU) which allows configuring memory attributes and access permissions for different memory regions.
Each memory region is configured using MPU registers to set attributes like:
- Cacheable/Non-cacheable: Allow caching for faster access.
- Shareable: To share between processors in multiprocessor system.
- Access permissions: Read/write/execute permissions.
- Type: Normal, device, strongly-ordered memory types.
Setting appropriate attributes is important for proper functioning. For example, setting cacheable attribute on memory mapped peripherals can cause issues.
The MPU also allows setting access permissions for memory regions to implement privilege levels and memory protection. For example, configuring flash region as read-only prevents accidental overwrite.
Memory Access Methods
The Cortex-M3 core supports different methods to access memory:
Single-Cycle CPU Access
On-chip memories like flash and SRAM can be accessed in a single cycle by the processor for fastest performance. The bus matrix connects the CPU directly to these memories.
AMBA Bus Access
For off-chip memories and other slaves, the processor uses the multi-layer AMBA bus protocol. This provides interfacing capability with external memories and high bandwidth access.
AMBA consists of parallel AHB bus for high performance access, and serial APB bus for low power peripherals. The Cortex-M3 integrates the AMBA 3 AHB-Lite bus protocol.
Bit Band Access
As described earlier, bit band regions provide atomic bit access using word transfers. Read or write to the bit-band alias address maps to the corresponding bit, avoiding read-modify-write sequences.
DMA Access
Direct Memory Access (DMA) controllers allow peripheral devices to directly access memory without CPU involvement. This offloads data transfers from the CPU to improve performance.
DMA controllers transfer data between peripherals and memories via DMA requests. This enables high bandwidth data transfers transparent to the CPU.
Memory Endianness
The Cortex-M3 implements little endian format for memory accesses. In little endian format, the least significant byte is stored at lowest address and most significant byte at highest address.
For example, for a 32-bit variable stored at address 0x10000000:
- Byte 0 (LSB) is at 0x10000000
- Byte 1 at 0x100000001
- Byte 2 at 0x10000002
- Byte 3 (MSB) at 0x10000003
This matches the convention used in most modern desktop computers for efficient byte access. Some embedded systems use big endian format but Cortex-M3 exclusively uses little endian memory access.
Caching
The Cortex-M3 does not implement instruction or data caching. Cache memories introduce latency and hardware complexity which is usually undesirable in embedded applications.
However, the MPU allows configuring cacheable regions to use external hardware caches when available. This enables caching for external memories to improve average access time.
Branch Prediction
Branch prediction reduces pipeline stalls by guessing branch directions. The Cortex-M3 does not implement branch prediction because:
- Branch penalties are small (3 cycles) due to short pipeline.
- Branch behavior is less predictable in embedded programs.
- Hardware complexity increases.
Instead it relies on efficient branch handling using delay slots and compact Thumb-2 branching instructions. Optimization techniques like branch chaining can minimize branch penalties.
Instruction Fetch
The Cortex-M3 fetches instructions in units of half-words (16-bit) from memory. Instruction fetch works as follows:
- Program Counter (PC) holds address of next instruction.
- Instruction is fetched as a half-word from memory at PC address.
- PC is incremented by 2 to point to next instruction.
- Fetched instruction decoded in Decode stage.
If the instruction is 32-bit Thumb-2 instruction, the second half-word is fetched back-to-back in the next cycle. The instruction fetch unit interfaces with the bus matrix to fetch instructions from memories.
Boot Sequence
On reset, the Cortex-M3 starts execution from the reset vector located at address 0x00000004. This location should contain the address of the startup routine.
The typical boot sequence is:
- On reset, load PC with value from reset vector.
- Startup code initializes hardware including clocks, memories, peripherals.
- Initialize .data and .bss sections in SRAM.
- Setup stack pointer and other registers.
- Call main() function.
Linker scripts ensure required code and constants are placed at correct memory locations. Vector table with exception handlers is placed at start of code region.
Code Density
Code density is an important factor in embedded systems with limited memory. The Cortex-M3 architecture and Thumb-2 instruction set provide high code density through features like:
- 16-bit and 32-bit instruction lengths.
- Branch instructions with small displacements.
- Immediate fields in data processing instructions.
- Load/store multiple instructions to reduce overhead.
Typical code size is 30% to 40% smaller than equivalent ARM instructions resulting in significant cost savings due to smaller flash usage.
Compiler optimizations likeThumb-2 interworking, branch chaining, procedure inlining also help reduce code size. Assembly coding can optimize hotspot functions.
Performance
The Cortex-M3 delivers very good performance for a microcontroller-class processor thanks to:
- High clock speeds of up to 150 MHz.
- 3-stage pipeline reduces CPI close to 1.
- Zero wait state access for on-chip memories.
- Efficient branch handling using delay slots.
- Fast interrupt handling using low-latency preemption.
The Dhrystone benchmark score is typically 1.1 DMIPS/MHz. Actual performance depends on optimization efforts like utilizing pipeline efficiently, minimizing stalls, and caching.
Debugging
The Cortex-M3 integrates several debug features to facilitate software development:
- Breakpoints and watchpoints for program and data.
- Code profiling using embedded trace module.
- Single-stepping instructions.
- System state inspection for registers, memories, peripherals.
These debug capabilities are accessed via industry standard JTAG interface using external debug probes. On-chip debug components reduce external hardware requirements.
Power Management
The Cortex-M3 supports various power saving modes:
- Active: CPU active executing instructions.
- Sleep: CPU stopped, peripherals and SRAM active.
- Deep Sleep: All clocks gated off except wakeup logic.
- Standby: SRAM and register contents retained.
- Shutdown: All power removed, memory state lost.
Lower power modes are used to save power during idle periods. Dedicated instruction puts CPU in sleep mode. Wakeup events like interrupts or DMA requests can wake up the processor to active state.
Peripheral clocks can also be gated off independently when inactive. Power optimization using various techniques can maximize battery life.
Design Considerations
Some key design considerations when using Cortex-M3 processor include:
- Using compiler optimizations like interworking, inlining to improve code density.
- Minimizing wait states for external memories by using SRAM/burst access.
- Setting appropriate MPU regions and attributes for memories and peripherals.
- Using bit banding to implement thread safe bit manipulations.
- Optimizing interrupt latencies for fast real-time response.
- Utilizing low power modes and clock gating for power savings.
Proper design ensures that application requirements for performance, power consumption, and cost are balanced using capabilities of the Cortex-M3 processor.
Conclusion
The Cortex-M3 microcontroller provides an excellent blend of high performance, low cost, and low power consumption. The flexible memory architecture along with bit banding enables efficient memory access. The short pipeline, Thumb-2 instruction set, and debug features make software development easier. Overall, the Cortex-M3 capabilities make it a versatile choice for a wide range of embedded applications.