Cortex-M3 Memory Access Constraints with Caches and Shared Memory

The Cortex-M3 processor has advanced memory access capabilities through the use of caches and shared memory regions. However, these features also impose certain constraints that need to be understood when developing applications. Careful memory architecture design is required to avoid hazards from overlapped or out-of-order memory accesses.

Contents

Instruction and Data Caches Cache Coherency Issues Implications of Cached Memory Regions Tightly Coupled Memories TCM Features TCM Usage Constraints Shared Memory Regions When to Use Shared Memory Ordering Constraints Performance Impact Atomic Accesses Atomic Instructions Hazards Memory Access Guidelines Conclusion

Instruction and Data Caches

The Cortex-M3 instruction and data caches are 4-way set associative with a least recently used (LRU) replacement policy. They can be configured as follows:

Separate instruction and data caches up to 16 KB each

Unified 16 KB cache for both instructions and data
Cache disabled

When enabled, the caches significantly reduce the number of external memory accesses and improve performance. However, they also introduce the possibility of stale data if memory contents change.

Cache Coherency Issues

If multiple bus masters are present, care must be taken to maintain cache coherency. For example, if the Cortex-M3 cache contains a copy of a memory location that is modified by a DMA transfer, the cache will now contain a stale value. Several techniques can help maintain coherency:

Invalidate the entire cache when DMA transfers complete
Invalidate cache lines corresponding to DMA target addresses

Designate DMA regions as “non-cacheable”
Use hardware coherency support if present

In addition to DMA transfers, self-modifying code can also lead to instruction cache coherency issues. The processor must load updated instructions after code has been modified.

Implications of Cached Memory Regions

When accessing cached memory regions, keep the following in mind:

The caches have no visibility into ongoing external memory accesses
A memory update may not be immediately visible to the core if a cache hit occurs

Code sequences should not rely on precise ordering of memory accesses
Explicit cache maintenance operations should be used as needed for coherency

Tightly Coupled Memories

In addition to caches, the Cortex-M3 also supports tightly coupled memories (TCMs). These are small low latency RAM regions that can store code and data. The processor can access TCMs in a single cycle, providing deterministic access without requiring cache flushes.

TCM Features

Single cycle access eliminates wait states
Ideal for performance critical code/data
Simplifies timing analysis by avoiding variable delays

TCM contents are guaranteed coherent with respect to external memory accesses

TCM Usage Constraints

However, TCMs have limited capacity and programmers should be aware of some constraints:

Fetching instructions from TCM eliminates prefetch buffering

Data TCMs are not suitable for DMA access
Entries are typically allocated at compile time, not dynamically

Shared Memory Regions

The Cortex-M3 also permits designating certain memory regions as “shared”. Accesses to shared memory have ordering requirements that prevent reordering by the CPU, thus avoiding hazards.

When to Use Shared Memory

Shared memory regions are useful when hardware needs to coordinate with program memory accesses. Typical use cases include:

Memory-mapped I/O registers
DMA buffer pools

Command queues
Hardware semaphores

Ordering Constraints

The following rules apply to memory marked as shared:

Instructions may not be reordered around accesses
Accesses to shared regions have program order
All caches and write buffers are flushed before and after access

This prevents situations where a pending cache line write back changes memory after a subsequent instruction has accessed it.

Performance Impact

Due to the strict ordering requirements, shared memory accesses are typically slower than unrestricted memory. Performance critical data structures and code should remain in normal memory.

Atomic Accesses

The Cortex-M3 supports atomic read-modify-write operations on memory for synchronization primitives like semaphores. These operate on shared regions to prevent simultaneous access.

Atomic Instructions

LDREX – Load exclusive register
STREX – Store exclusive register
CLREX – Clear exclusive monitor

A LDREX/STREX pair performs an atomic read-modify-write sequence. CLREX simply clears the exclusive monitor.

Hazards

Care must be taken with atomic instruction sequences since they impose hazards:

Any exception between LDREX and STREX will abort sequence

LDREX marks a region pending writeback, limiting parallelism
Only one LDREX/STREX pair can be pending at once

Memory Access Guidelines

Given the complex interactions between caching, TCM regions, shared memory, and atomic accesses, follow these guidelines for robust operation:

Ensure cache coherency with DMA transfers and self-modifying code
Use TCM for critical code/data but beware size constraints
Designate shared regions appropriately to avoid reordering

Use atomic operations carefully to prevent aborts
Analyze all memory access paths to identify potential issues

Conclusion

The Cortex-M3 provides very capable memory access mechanisms, but they also impose constraints that software must consider carefully. Keeping caches coherent, avoiding shared memory hazards, and using atomic operations correctly is key. With good architecture and access hygiene, the advanced memory system can be safely leveraged to build high performance real-time embedded applications.

Cortex-M3 Memory Access Constraints with Caches and Shared Memory

Instruction and Data Caches

Cache Coherency Issues

Implications of Cached Memory Regions

Tightly Coupled Memories

TCM Features

TCM Usage Constraints

Shared Memory Regions

When to Use Shared Memory

Ordering Constraints

Performance Impact

Atomic Accesses

Atomic Instructions

Hazards

Memory Access Guidelines

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Best practices for Cortex-M1 MMI generation in Xilinx FPGAs

Bootloader Impact on Cortex-M0 PendSV Exception Handling

Understanding Code Generation Issues with GNU-ARM for Cortex-M0/M1

What are the different types of ARM Cortex-M?