Memory Options and Tradeoffs in ARM Cortex-M

ARM Cortex-M microcontrollers offer a variety of memory options to choose from. Selecting the right memory configuration requires balancing factors like cost, performance, power consumption, and flexibility. This article provides an overview of the key memory technologies used in Cortex-M devices and discusses the tradeoffs involved in selecting between them.

Contents

SRAM Flash Memory Read Acceleration Techniques Flash Memory Types ROM TCM – Tightly Coupled Memory External Memories Tradeoffs of External Memory Choosing the Right Memory Memory Mapping Improving Performance with Memory Regions Caching Cache Considerations Memory Protection Unit MPU Tradeoffs Conclusion

SRAM

Static RAM (SRAM) is the fastest and simplest type of on-chip memory available for Cortex-M cores. It does not require any complex memory management and can be accessed directly by the CPU with single-cycle latency. SRAM offers high performance but is an expensive option compared to other memory types.

The key characteristics of SRAM include:

Very fast access times – single clock cycle reads and writes
Expensive per bit compared to other memory options
Lower densities than other memory technologies

Continuous power consumption even when not being accessed
Non-volatile – data is retained when power is removed

SRAM is best suited for time-critical data that needs fast random access from the CPU, such as stack, heap, and global variables. Most Cortex-M microcontrollers contain at least a few kilobytes of embedded SRAM.

Flash Memory

Flash memory is the most common type of embedded memory used with Cortex-M cores. It provides non-volatile storage at a much lower cost than SRAM. Flash tradeoffs reduced performance for lower price and higher densities.

Below are the key attributes of Flash memory:

High density and low cost per bit

Slower access than SRAM – typically 60-100 clock cycles for reads
Even slower writes and block erase operations
Limited write endurance – can only be programmed and erased a finite number of times

Requires memory management for wear leveling

Flash memory is well-suited for storing firmware, application data, constants, and other read-mostly data. It typically occupies the majority of the embedded memory space in Cortex-M devices. Both the program code and any non-volatile variables are located in flash.

Read Acceleration Techniques

To compensate for the relatively slow access speed of flash memory, Cortex-M processors employ several read acceleration techniques:

Prefetches – the processor anticipates future instruction reads and brings them into the instruction cache ahead of time.
Caching – small high-speed memory blocks are used to cache frequently accessed flash contents.
Pipelining – read operations are split into multiple stages to improve throughput.

These techniques can help boost average flash read performance. However, worst-case latency can still be multiple tens of clock cycles.

Flash Memory Types

There are several different flash memory technologies used with Cortex-M cores:

NOR Flash – provides random access reads in single address cycles. Writes take longer and must erase larger sectors before programming.

NAND Flash – offers higher densities but requires read/write buffers. It is designed for serial access and fast data streaming.
EEPROM – provides electrical erasing and programming of individual bytes. Access times and write endurance are low.

NOR flash provides the best performance and easiest interface for code execution and data storage. NAND flash is more suitable for mass storage needs such as data logging or multimedia content. EEPROM offers high flexibility but lower capacity and requires wear leveling.

ROM

Read-only memory (ROM) provides non-volatile storage like flash but cannot be electrically modified. Data is fixed once it is programmed on the chip during manufacturing. Key characteristics include:

Very low cost per bit
Fast read performance – similar access time to SRAM

Permanent data storage – cannot be modified or erased
Often used for exception vectors, math routines, constants

ROM is useful for data that needs fast access but will never need to be updated. This includes boot code, interrupt vectors, trigonometric tables, and hardware peripheral addresses. ROM is the cheapest memory option per bit but lacks flexibility.

TCM – Tightly Coupled Memory

Tightly coupled memory (TCM) acts as an extremely fast block of SRAM that sits alongside the CPU core. It provides single-cycle access latency like SRAM but is more expensive than flash.

Key features of TCM include:

Very low access latency – single cycle

Limited sizes – up to a few 10s of KB
More expensive than flash but cheaper than SRAM
Requires software management for allocation

TCM provides fast scratchpad storage optimized for performance critical routines and data. It is useful for time sensitive algorithms, stack data, and interrupt handlers. TCM offers a middle ground between the speed of SRAM and the density of flash.

External Memories

In addition to internal memory, Cortex-M MCUs can also be connected to external memories to supplement their storage capabilities. Common options include:

Asynchronous SRAM

Synchronous Dynamic RAM (SDRAM)
Quad SPI NOR flash
NAND flash with DMA

External memories can provide much higher capacities to store large data sets or multimedia content. However, access latency is slower compared to internal memories due to the required bus transactions.

Tradeoffs of External Memory

The key tradeoffs when using external memory include:

Higher memory capacities

Slower access times than internal memories
Requires porting code/data to external addresses
Increased system cost due to extra components

Higher power consumption for external memory and bus interface

External memory is useful for large local data storage needs where performance is non-critical. The latency and power tradeoffs must be considered compared to keeping data in internal memory.

Choosing the Right Memory

Selecting the appropriate memory configuration requires analyzing the target application requirements. Important factors include:

Performance – Are fast access times needed? What are the speed critical operations?
Capacity – How much memory capacity is required? Will external memory be needed?
Power – Is low power operation critical? Flash consumes less active power than SRAM.

Flexibility – How often will contents need to change? Flash or EEPROM allow writes.
Cost – Lower densities maximize SRAM usage but increase cost. Finding the right balance is key.

By analyzing the target application requirements and doing performance profiling on critical operations, an appropriate Cortex-M memory system can be designed to meet the needs of the system.

Memory Mapping

The ARMv6-M and ARMv7-M architectures used in Cortex-M microcontrollers provide a flexible memory mapping system to access different physical memory regions.

The ARMv7-M architecture allows splitting the memory map into the following regions:

Code – Executable region for program code that can be cached.

SRAM – General purpose SRAM region.
Peripheral – Memory mapped registers for hardware peripherals.
External Device – Addresses that map to external memory chips.

System – Special regions for interrupts, exceptions, and configuration data.

The processor generates bus transactions targeted at these regions based on the address being accessed. This allows transparently using multiple physical memory types through the same logical address space.

Improving Performance with Memory Regions

Performance can be optimized by carefully assigning code and data to different memory regions. Examples include:

Placing performance critical code and data in SRAM regions
Moving slower peripheral data access to separate regions from code and SRAM
Allocating buffers used for external memory access to separate regions

This allows prioritizing faster memory for latency sensitive operations while letting slower operations execute in parallel.

Caching

Caches are small fast memory arrays that store copies of recently accessed data. Cortex-M processors support caching flash contents to improve average access times. Key points about caching include:

Instruction caching caches program code for faster execution

Data caching caches data variables and constants
Write-through and write-back policies manage cache coherence
Hit rates indicate how often the cache provides the requested data

Higher hit rates improve performance by avoiding main memory access

Properly configured caches transparently improve performance for memory regions with slower access times like flash. This comes at the cost of increased silicon area and design complexity.

Cache Considerations

Factors to consider when designing with memory caches:

Balancing cache size, cost, and hit rates
Impacts of cache misses on worst-case performance
Effects of caching on real-time determinism in the system

Cache coherency overhead and maintainance
Reserving cache ways for time critical data

Like all forms of memory, caches require trading off multiple design factors. When used properly caches can greatly boost average performance.

Memory Protection Unit

The Memory Protection Unit (MPU) provides hardware access control to different memory regions. Key capabilities of the MPU include:

Configurable access permissions for code, RAM, peripherals, and external memory
Setting privileges for privileged and unprivileged application software

Preventing accidental or malicious accesses to protected memory
Enabling user/supervisor memory protection schemes

The MPU improves system reliability and security. It can restrict memory accesses to only allowed regions, catching errors and potential exploits. Proper configuration is necessary to balance protection and performance.

MPU Tradeoffs

The benefits of an MPU come at the cost of additional complexity in the memory system:

Additional configuration overhead to set up MPU regions and access permissions
Extra CPU instructions needed to modify MPU settings

Increase memory fragmentation with smaller protected regions
Overhead to handle MPU exceptions and permission violations

Like all hardware-based protection schemes, the MPU improves security but requires careful configuration not to adversely impact performance and flexibility.

Conclusion

ARM Cortex-M processors provide a wide array of memory technologies and configuration options. Selecting the right memory system requires balancing cost, speed, density, power, and flexibility for the target application. Optimizing memory usage is a key architectural design decision impacting performance, cost, and reliability of the final system.

Memory Options and Tradeoffs in ARM Cortex-M

SRAM

Flash Memory

Read Acceleration Techniques

Flash Memory Types

ROM

TCM – Tightly Coupled Memory

External Memories

Tradeoffs of External Memory

Choosing the Right Memory

Memory Mapping

Improving Performance with Memory Regions

Caching

Cache Considerations

Memory Protection Unit

MPU Tradeoffs

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

What is Serial Wire Viewer (SWV) in Arm Cortex-M?

Flash Patch and Breakpoint Unit (FPB) in Arm Cortex-M Explained

Arm Cortex-M DAP bus and interconnect architecture Explained

Controlling Clocks and PLL for Power Savings in Cortex-M3