What is Data Cache in Arm Cortex-M series?

The data cache in Arm Cortex-M series microcontrollers is a small, fast memory that stores copies of data from the main memory. The purpose of the data cache is to reduce the number of accesses to the main memory, which is slower, and improve the performance of data retrieval operations.

Contents

Overview of Caches Benefits of Caching Cache Organization Cache Operation Write Policies Data Cache in Cortex-M Cache Features Cache Maintenance Cache Coherence Cache Performance Guidelines for Optimizing Cache Performance Configuring Cache in Cortex-M Use Cases Limitations Conclusion

Overview of Caches

In computer systems, caches are small memories used to store copies of frequently used data. They serve as temporary staging areas for data that the processor is likely to need next. Reading data from a cache is much faster than reading from main memory.

Caches exploit the locality of reference principle – the tendency for programs to reuse data and instructions they have used recently. By keeping copies of recently accessed data in the fast cache, the processor avoids having to read slower main memory every time that data is needed.

Benefits of Caching

The key benefits of using caches are:

Reduced latency – Cache hits are faster than memory reads
Increased throughput – By reducing stalls due to main memory reads, the processor can do more work

Lower power consumption – Caches require less power than main memory
Simplified design – Caches hide memory latency from the processor

Cache Organization

Caches have a cache controller, cache memory, and cache directory. The cache controller manages the data flow between the main memory and the cache memory. The cache memory stores the actual copies of data. And the cache directory stores the mapping between memory addresses and cache locations.

Caches are organized into cache lines (or blocks). Each cache line corresponds to a contiguous block of memory that is copied as a unit to the cache. Typical cache line sizes range from 16 to 128 bytes. Data is moved between memory and cache in units of cache lines.

Cache Operation

When the processor needs to read data, it first checks if the data is present in the cache. If so, a cache hit occurs and the data is returned quickly. If not, a cache miss occurs and the data must be read from the slower main memory.

On a cache miss, a cache line containing the requested data is copied from memory into the cache. Other data at this cache location is evicted. The processor also reads ahead and pulls more data into the cache. This prefetching exploits spatial locality to improve performance.

Write Policies

With write operations, caches implement either a write-through or write-back policy. In a write-through cache, data is written to both the cache and main memory. In a write-back cache, data is only written to the cache initially. Writes are forwarded to main memory later when the cache line is evicted.

Data Cache in Cortex-M

The data cache in Cortex-M microcontrollers is a 4-way set associative write-through cache. It has a configurable size up to 32 Kbytes. The cache line size is 4 words (16 bytes).

The Cortex-M data cache sits between the CPU and the bus matrix. It helps reduce bus traffic and memory latency. The cache has allocators to buffer and align data. There are also write buffers to hold pending writes until the bus is available.

Cache Features

Key features of the Cortex-M data cache include:

4-way set associative organization
Write-through policy

Allocate on reads
Non-allocate on writes
16 byte cache lines

LRU replacement policy
Optional ECC protection

Cache Maintenance

The Cortex-M cache controller provides cache maintenance operations to manage cache coherence and consistency. These operations include:

Invalidate – Mark cache line as invalid
Clean – Write dirty data to memory
Clean and invalidate – Clean then invalidate cache line

Flush – Clean and invalidate entire cache

The ARMv7 architecture defines special cache maintenance instructions for these operations. The processor can perform maintenance operations on a single line, a cache set, or the entire cache.

Cache Coherence

In multicore Cortex-M systems, each core has its own data cache. ARM recommends using a modified Harvard cache architecture to maintain coherence. Instruction caches are kept coherent using hardware mechanisms. For data, a software cache coherence protocol is defined.

The protocol involves flushing or cleaning data caches at synchronization points. Multicore semaphores, locks, and shared data structures are designed to force cache maintenance operations when entering and exiting critical sections. This prevents cores from operating on stale cached data.

Cache Performance

The performance benefit of caching depends on the cache hit rate. This is the fraction of memory accesses that are satisfied by the cache without accessing main memory. A higher hit rate results in lower average memory access time.

The hit rate depends on the cache size, access locality of the application, and other policies like replacement and write strategy. By optimizing cache usage, a system can significantly improve performance.

Guidelines for Optimizing Cache Performance

Here are some guidelines for optimizing cache performance in a Cortex-M system:

Organize data structures to maximize spatial locality and sequential access
Improve temporal locality by reusing data and instructions

Increase cache size if the hit rate is low
Optimize cache friendly code to maximize cache hits
Minimize cache misses by prefetching data

Use cache coloring to avoid conflict misses
Leverage multi-core coherence protocols
Allocate stack and global variables cache optimally

Profiling cache behavior and optimizing based on real usage is key. Tools like ARM Streamline can be used to analyze cache performance.

Configuring Cache in Cortex-M

The Cortex-M data cache is highly configurable via processor registers. Key configuration options include:

Enabling/disabling the cache

Setting cache size from 4KB to 32KB
Way size and associativity
Burst length for cache refills

Memory access latency for misses
Shared attribute for multiprocessor coherence
Cache control register settings

At runtime, the cache can be enabled/disabled by manipulating the Cache Enable bit in the Auxiliary Control Register. The cache is disabled on reset.

Use Cases

Some typical use cases for leveraging the Cortex-M data cache are:

Storing frequently used data structures

Caching code sections to improve instruction fetch performance
Buffering data transferred over low bandwidth buses
Avoiding wait states when accessing high latency memories

Prefetching data for fast signal processing algorithms

For memory intensive applications, the data cache can help avoid stalls and improve throughput. It works best when access patterns have good locality.

Limitations

While caches improve performance, they have some limitations:

Added latency on cache misses
Complexity of cache coherence in multicore systems
Overhead of managing cache with limited memory

Power consumption of cache memories
Difficult to optimize due to non-deterministic behavior

The benefits of caching may be less noticeable for small, deterministic real-time systems. Cache usage should be tailored to the application requirements.

Conclusion

The Cortex-M data cache reduces memory latency by storing local copies of frequently used data. It improves performance by exploiting locality of memory accesses in embedded applications. Cache optimization can provide significant speedups for memory-bound use cases.

Understanding cache organization, operation, and configuration is key to utilizing it effectively. Paying attention to cache usage and tuning cache policies accordingly helps unlock the benefits of caching in embedded Arm processors.

What is Data Cache in Arm Cortex-M series?

Overview of Caches

Benefits of Caching

Cache Organization

Cache Operation

Write Policies

Data Cache in Cortex-M

Cache Features

Cache Maintenance

Cache Coherence

Cache Performance

Guidelines for Optimizing Cache Performance

Configuring Cache in Cortex-M

Use Cases

Limitations

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Cortex-M3 Flash Memory Limitations and Workarounds

What is the most efficient ARM processor?

What is Instruction Cache in Arm Cortex-M series?

Arm Cortex-M0+ Power Consumption