ARM Cortex-M microcontrollers offer a variety of memory options to choose from. Selecting the right memory configuration requires balancing factors like cost, performance, power consumption, and flexibility. This article provides an overview of the key memory technologies used in Cortex-M devices and discusses the tradeoffs involved in selecting between them.
SRAM
Static RAM (SRAM) is the fastest and simplest type of on-chip memory available for Cortex-M cores. It does not require any complex memory management and can be accessed directly by the CPU with single-cycle latency. SRAM offers high performance but is an expensive option compared to other memory types.
The key characteristics of SRAM include:
- Very fast access times – single clock cycle reads and writes
- Expensive per bit compared to other memory options
- Lower densities than other memory technologies
- Continuous power consumption even when not being accessed
- Non-volatile – data is retained when power is removed
SRAM is best suited for time-critical data that needs fast random access from the CPU, such as stack, heap, and global variables. Most Cortex-M microcontrollers contain at least a few kilobytes of embedded SRAM.
Flash Memory
Flash memory is the most common type of embedded memory used with Cortex-M cores. It provides non-volatile storage at a much lower cost than SRAM. Flash tradeoffs reduced performance for lower price and higher densities.
Below are the key attributes of Flash memory:
- High density and low cost per bit
- Slower access than SRAM – typically 60-100 clock cycles for reads
- Even slower writes and block erase operations
- Limited write endurance – can only be programmed and erased a finite number of times
- Requires memory management for wear leveling
Flash memory is well-suited for storing firmware, application data, constants, and other read-mostly data. It typically occupies the majority of the embedded memory space in Cortex-M devices. Both the program code and any non-volatile variables are located in flash.
Read Acceleration Techniques
To compensate for the relatively slow access speed of flash memory, Cortex-M processors employ several read acceleration techniques:
- Prefetches – the processor anticipates future instruction reads and brings them into the instruction cache ahead of time.
- Caching – small high-speed memory blocks are used to cache frequently accessed flash contents.
- Pipelining – read operations are split into multiple stages to improve throughput.
These techniques can help boost average flash read performance. However, worst-case latency can still be multiple tens of clock cycles.
Flash Memory Types
There are several different flash memory technologies used with Cortex-M cores:
- NOR Flash – provides random access reads in single address cycles. Writes take longer and must erase larger sectors before programming.
- NAND Flash – offers higher densities but requires read/write buffers. It is designed for serial access and fast data streaming.
- EEPROM – provides electrical erasing and programming of individual bytes. Access times and write endurance are low.
NOR flash provides the best performance and easiest interface for code execution and data storage. NAND flash is more suitable for mass storage needs such as data logging or multimedia content. EEPROM offers high flexibility but lower capacity and requires wear leveling.
ROM
Read-only memory (ROM) provides non-volatile storage like flash but cannot be electrically modified. Data is fixed once it is programmed on the chip during manufacturing. Key characteristics include:
- Very low cost per bit
- Fast read performance – similar access time to SRAM
- Permanent data storage – cannot be modified or erased
- Often used for exception vectors, math routines, constants
ROM is useful for data that needs fast access but will never need to be updated. This includes boot code, interrupt vectors, trigonometric tables, and hardware peripheral addresses. ROM is the cheapest memory option per bit but lacks flexibility.
TCM – Tightly Coupled Memory
Tightly coupled memory (TCM) acts as an extremely fast block of SRAM that sits alongside the CPU core. It provides single-cycle access latency like SRAM but is more expensive than flash.
Key features of TCM include:
- Very low access latency – single cycle
- Limited sizes – up to a few 10s of KB
- More expensive than flash but cheaper than SRAM
- Requires software management for allocation
TCM provides fast scratchpad storage optimized for performance critical routines and data. It is useful for time sensitive algorithms, stack data, and interrupt handlers. TCM offers a middle ground between the speed of SRAM and the density of flash.
External Memories
In addition to internal memory, Cortex-M MCUs can also be connected to external memories to supplement their storage capabilities. Common options include:
- Asynchronous SRAM
- Synchronous Dynamic RAM (SDRAM)
- Quad SPI NOR flash
- NAND flash with DMA
External memories can provide much higher capacities to store large data sets or multimedia content. However, access latency is slower compared to internal memories due to the required bus transactions.
Tradeoffs of External Memory
The key tradeoffs when using external memory include:
- Higher memory capacities
- Slower access times than internal memories
- Requires porting code/data to external addresses
- Increased system cost due to extra components
- Higher power consumption for external memory and bus interface
External memory is useful for large local data storage needs where performance is non-critical. The latency and power tradeoffs must be considered compared to keeping data in internal memory.
Choosing the Right Memory
Selecting the appropriate memory configuration requires analyzing the target application requirements. Important factors include:
- Performance – Are fast access times needed? What are the speed critical operations?
- Capacity – How much memory capacity is required? Will external memory be needed?
- Power – Is low power operation critical? Flash consumes less active power than SRAM.
- Flexibility – How often will contents need to change? Flash or EEPROM allow writes.
- Cost – Lower densities maximize SRAM usage but increase cost. Finding the right balance is key.
By analyzing the target application requirements and doing performance profiling on critical operations, an appropriate Cortex-M memory system can be designed to meet the needs of the system.
Memory Mapping
The ARMv6-M and ARMv7-M architectures used in Cortex-M microcontrollers provide a flexible memory mapping system to access different physical memory regions.
The ARMv7-M architecture allows splitting the memory map into the following regions:
- Code – Executable region for program code that can be cached.
- SRAM – General purpose SRAM region.
- Peripheral – Memory mapped registers for hardware peripherals.
- External Device – Addresses that map to external memory chips.
- System – Special regions for interrupts, exceptions, and configuration data.
The processor generates bus transactions targeted at these regions based on the address being accessed. This allows transparently using multiple physical memory types through the same logical address space.
Improving Performance with Memory Regions
Performance can be optimized by carefully assigning code and data to different memory regions. Examples include:
- Placing performance critical code and data in SRAM regions
- Moving slower peripheral data access to separate regions from code and SRAM
- Allocating buffers used for external memory access to separate regions
This allows prioritizing faster memory for latency sensitive operations while letting slower operations execute in parallel.
Caching
Caches are small fast memory arrays that store copies of recently accessed data. Cortex-M processors support caching flash contents to improve average access times. Key points about caching include:
- Instruction caching caches program code for faster execution
- Data caching caches data variables and constants
- Write-through and write-back policies manage cache coherence
- Hit rates indicate how often the cache provides the requested data
- Higher hit rates improve performance by avoiding main memory access
Properly configured caches transparently improve performance for memory regions with slower access times like flash. This comes at the cost of increased silicon area and design complexity.
Cache Considerations
Factors to consider when designing with memory caches:
- Balancing cache size, cost, and hit rates
- Impacts of cache misses on worst-case performance
- Effects of caching on real-time determinism in the system
- Cache coherency overhead and maintainance
- Reserving cache ways for time critical data
Like all forms of memory, caches require trading off multiple design factors. When used properly caches can greatly boost average performance.
Memory Protection Unit
The Memory Protection Unit (MPU) provides hardware access control to different memory regions. Key capabilities of the MPU include:
- Configurable access permissions for code, RAM, peripherals, and external memory
- Setting privileges for privileged and unprivileged application software
- Preventing accidental or malicious accesses to protected memory
- Enabling user/supervisor memory protection schemes
The MPU improves system reliability and security. It can restrict memory accesses to only allowed regions, catching errors and potential exploits. Proper configuration is necessary to balance protection and performance.
MPU Tradeoffs
The benefits of an MPU come at the cost of additional complexity in the memory system:
- Additional configuration overhead to set up MPU regions and access permissions
- Extra CPU instructions needed to modify MPU settings
- Increase memory fragmentation with smaller protected regions
- Overhead to handle MPU exceptions and permission violations
Like all hardware-based protection schemes, the MPU improves security but requires careful configuration not to adversely impact performance and flexibility.
Conclusion
ARM Cortex-M processors provide a wide array of memory technologies and configuration options. Selecting the right memory system requires balancing cost, speed, density, power, and flexibility for the target application. Optimizing memory usage is a key architectural design decision impacting performance, cost, and reliability of the final system.