The cache in ARM Cortex processors is a small, fast memory that stores copies of frequently used data and instructions from the main memory. The cache improves performance by reducing the number of accesses to the slower main memory. When the processor needs data or instructions, it first checks the cache, and if the required information is not found in the cache (a cache miss), it is fetched from the main memory.
Types of Cache
There are two main types of cache in ARM Cortex processors:
- Instruction Cache – Stores instructions that are frequently executed by the processor.
- Data Cache – Stores data that is frequently accessed by the processor.
The instruction and data caches operate independently. When the processor requests an instruction or data, both caches are checked in parallel.
Cache Organization
The cache is organized into lines, which are fixed-size blocks of memory. Each cache line stores a copy of a block of contiguous bytes from the main memory. The size of a cache line varies between different ARM Cortex processors, but is typically 32 or 64 bytes.
The lines in the cache are further divided into blocks. Each block contains the data or instruction along with the tag or address of the corresponding location in main memory. When the processor requests an instruction or data, the cache controller checks the tag of each block to see if it matches the requested address.
Mapping Cache Lines to Main Memory
Since the cache capacity is limited, a mechanism is needed to map blocks from main memory to the cache lines. ARM Cortex processors utilize set-associative mapping:
- The cache is divided into sets, each containing a fixed number of lines (e.g. 8 lines per set).
- The main memory address is split into tag, set index, and block offset bits.
- Blocks can be mapped to any line in the set specified by set index bits.
This provides a compromise between direct and fully associative mapping. Frequently used data can be distributed over the cache without collisions.
Cache Management Policies
Several policies determine how the cache handles misses and replacements:
1. Cache Miss Handling
- Write-through – Data is written to both cache and main memory.
- Write-back – Data is only written to the cache. Modified cache lines are written back to main memory when replaced.
2. Line Replacement
When a new line needs to be brought into the cache, and no lines are invalid, one of the existing lines must be evicted. Replacement policies include:
- Least Recently Used (LRU) – The line that was least recently accessed is replaced.
- First In First Out (FIFO) – The oldest line in the set is replaced.
3. Write Allocation
- Write-allocate – When a write miss occurs, a new line is allocated in the cache.
- No write-allocate – The write goes directly to main memory without allocating a cache line.
Cache Coherency
In multi-core ARM processors, each core has its own level 1 (L1) instruction and data cache. All cores share a last level cache (LLC). Cache coherency protocols maintain consistency between shared data in these caches. ARM Cortex processors use the MOESI protocol to indicate line states:
- Modified – Updated locally, not synchronized. Other copies are invalid.
- Owned – Exclusive clean data. No other copies in local caches.
- Exclusive – Clean shared data. Other copies may exist in LLC or main memory.
- Shared – Clean shared data. Copies may exist in other caches.
- Invalid – Line does not contain valid data.
The ARM cores and LLC controller units use this protocol to coordinate data transfers while maintaining a coherent view of memory.
Cache Performance
The cache improves processor performance in several ways:
- Latency is reduced since cache access is much faster than main memory access.
- The cache bandwidth is higher than main memory bandwidth.
- Spatial and temporal locality in instruction and data accesses are exploited.
However, incorrect caching of data can also result in a performance penalty. Cache misses stall the pipeline while the data is fetched from main memory. Strategies like prefetching and aligning data structures to cache line size help improve cache utilization.
Cache in ARM Cortex-A Series
Here are some examples of cache configurations in the ARM Cortex-A application processor series:
- Cortex-A5 – 8KB L1 instruction cache, 8KB L1 data cache.
- Cortex-A9 – 32KB L1 instruction cache, 32KB L1 data cache, 512KB shared L2 cache.
- Cortex-A15 – 32KB L1 instruction cache, 32KB L1 data cache, 2MB shared L2 cache.
- Cortex-A72 – 48KB L1 instruction cache, 32KB L1 data cache, 2MB shared L2 cache.
The L1 caches are integrated on the processor core, while the larger L2 cache is on a separate chip. Larger L2 caches reduce the miss rate and improve performance at the cost of increased latency.
Cache in ARM Cortex-M Series
Here are some examples of cache configurations in the ARM Cortex-M microcontroller series:
- Cortex-M7 – 16KB instruction cache, 16KB data cache.
- Cortex-M33 – 8KB instruction cache, 8KB data cache.
- Cortex-M23 – No L1 cache, 128KB tightly coupled memory acting as RAM.
- Cortex-M3 – No L1 cache, optional tightly coupled memory.
The Cortex-M series focuses more on real-time responsiveness rather than peak performance, so smaller or no caches are implemented. Tightly coupled memory provides low latency RAM access.
Conclusion
In summary, the cache in ARM Cortex processors is fast on-chip memory that stores frequently used instructions and data to improve performance. It exploits locality and reduces accesses to main memory. The cache organization, mapping policies and coherence protocols allow efficient utilization of the limited cache capacity. Appropriate configuration of cache size and architecture is important for optimizing the performance of ARM Cortex processors.