The Arm Cortex-M1 processor is designed for low-power embedded applications. It has a simple memory system without caches or memory management units. The Cortex-M1 memory system needs to be configured correctly for optimal performance and power efficiency.
Cortex-M1 Memory Architecture
The Cortex-M1 contains separate instruction and data bus interfaces to external memory. It has a von Neumann architecture so the instruction and data interfaces can be connected to the same memory. The processor contains limited amounts of dedicated instruction and data tightly coupled memories (TCM) for critical code and data. The TCM provides single cycle access but is limited to 64KB each for instructions and data.
For larger memories, the Cortex-M1 interfaces to external memory through the Advanced High-performance Bus (AHB). The AHB acts as a system bus and interconnect for on-chip peripherals and external memories and devices. It uses a central arbitration scheme to allow multiple bus masters to access the bus through a common interface.
The Cortex-M1 AHB interface has a 32-bit data width and runs at CPU frequency. It can provide a peak transfer rate of one word per cycle. The external memories are commonly SRAM or SDRAM and provide higher capacity storage than the TCMs but have longer access latency of multiple clock cycles.
TCM Configuration
The TCM provides single cycle access which maximizes Cortex-M1 performance for critical code and data. For best performance, frequently used code and data should be placed in ITCM and DTCM respectively. The compiler can place functions and data in TCM using directives.
ITCM is best utilized for time-critical interrupt handlers, DSP algorithms, and inner loops of key functions. DTCM should contain performance critical variables and data structures. Unused TCM results in wasted silicon area so memory requirements should be analyzed to right-size TCM capacity.
The TCM access time is one clock cycle so it should match the CPU frequency. Running TCM faster than the CPU wastes power while running it slower will stall the CPU. TCM also draws static current so limiting capacity as much as possible saves leakage power.
External RAM Configuration
External RAM provides higher capacity memory for code and data but has longer access latency. SRAM provides faster access times down to 10ns but is more expensive. SDRAM has access latency around 50-70ns but is cheaper per bit.
The AHB interface runs at CPU frequency so SDRAM may need a higher clock rate to match AHB bandwidth. Most Cortex-M1 based systems run SDRAM at 1-2x CPU speed to avoid starving the CPU. The external memory controller must also meet SDRAM timing requirements.
SDRAM has an initial latency of tens of clock cycles after opening a row. Fast memory controllers will prefetch code and data to hide this latency. Multi-bank SDRAM also improves average access time by interleaving accesses.
The Cortex-M1 only issues one outstanding external memory access at a time. Long multi-cycle SDRAM accesses can stall the CPU pipeline. Compiler optimizations to schedule instructions can help avoid stalls during accesses.
Cache Configuration
The Cortex-M1 does not contain instruction or data caches. Cache would reduce average access latency but at the cost of silicon area and power consumption. The deterministic single cycle TCM access also precludes the need for caching.
External memories can still implement caches transparently to the processor. Many SDRAM controllers include SRAM caches to hide row access latency. Memory mapped peripherals may also contain local caches for their registers and data.
These system level caches do not participate in processor coherency protocols. Software drivers and the compiler may need to invalidate caches or use cache bypass instructions for memory mapped peripherals.
Optimizing Memory Performance
There are several techniques to optimize memory performance when configuring the Cortex-M1 system:
- Place critical code and data in ITCM and DTCM
- Size TCM to minimize leakage while meeting performance needs
- Run TCM at CPU frequency to avoid stalls
- Use SDRAM configuration that matches AHB bandwidth
- Enable SDRAM controller prefetch if available
- Use multi-bank SDRAM to increase concurrency
- Schedule code to prevent pipeline stalls during SDRAM access
- Disable caches when accessing memory mapped peripherals
Profiling memory access patterns and timing is essential to ensuring the Cortex-M1 meets performance requirements. Memory system configuration and optimization provides significant opportunities to improve performance and efficiency.
Conclusion
The Cortex-M1 memory architecture with TCM and AHB bus provides flexible options for embedded systems. Optimizing the usage and configuration of TCM, SDRAM, and caches can maximize performance and efficiency. Careful memory system design is key to building high-performance Cortex-M1 applications.