What is Instruction Cache in Arm Cortex-M series?

The instruction cache in ARM Cortex-M series microcontrollers is a small, fast memory that stores recently accessed instructions to improve performance. It sits between the CPU and the main memory, caching instructions so the CPU does not have to access slower external memory as frequently. This speeds up instruction fetches and overall execution.

Contents

What is an Instruction Cache?Why Have an Instruction Cache?Instruction Cache in ARM Cortex-M Cortex-M7 Instruction Cache Cortex-M4 Instruction Cache How the Instruction Cache Works Cache Coherency Cache Maintenance Operations Performance Impact When the Cache Fails Instruction Cache Design Considerations Summary

What is an Instruction Cache?

An instruction cache is a hardware cache used to speed up execution of computer programs by reducing wait times for fetch instructions from memory. It is a memory bank that stores copies of recently used instructions from the main memory. When the processor needs to read instruction data, it first checks the instruction cache. If the instructions are present, they are read from the faster cache memory instead of the slower external memory.

Caches exploit the locality of references in typical programs – instructions that are executed close together in time tend to also be located close together in memory. The cache essentially provides a buffer storing recent instruction history so that if the instruction is reused, it does not have to be re-fetched from main memory.

Why Have an Instruction Cache?

There is a significant speed difference between a processor’s clock speed and main memory access times. This speed gap means the CPU is often idle, waiting for instructions to be fetched from memory. The instruction cache reduces this wait time by providing faster access to cached instructions.

Main benefits of an instruction cache:

Improves performance by reducing the average cost and time for instruction fetches

Increases the speed that a processor can execute instructions and programs
Helps avoid stalling the CPU on instruction fetch cycles
Makes better use of the faster CPU clock cycles

Buffers recent instruction history to capitalize on locality of reference principles

Overall, the instruction cache improves performance by taking advantage of the program locality and bridging the gap between the CPU and main memory speeds.

Instruction Cache in ARM Cortex-M

The ARM Cortex-M series are 32-bit RISC ARM processor cores designed for embedded and IoT applications. Many Cortex-M variants have an integrated instruction cache module. For example:

Cortex-M7 includes a 4-way set-associative instruction cache
Cortex-M4 includes a 2-way set-associative instruction cache
Cortex-M3 does not have an instruction cache

The presence and size of the instruction cache differs between models. But in all cases it serves the same purpose – reducing instruction fetch times to improve performance.

Cortex-M7 Instruction Cache

The Cortex-M7 includes a 4-way set-associative instruction cache with several configurable parameters:

Total cache size – 8KB to 64KB

Line length – 4 words to 8 words
Read latency – 1 to 15 cycles
Number of sets – 128 to 1024

The CPU first checks the instruction cache when fetching instructions. A cache hit reduces the access time to just 1 cycle. A cache miss requires fetching from slower external memory.

The 4-way set associativity improves hit rate and performance. It allows four different cache lines to reside in a single set. The Least Recently Used (LRU) algorithm determines which line is evicted when a new line is fetched.

Cortex-M4 Instruction Cache

The Cortex-M4 integrates a simpler 2-way set-associative instruction cache with the following features:

Fixed 2-way set associativity
128 sets
Line length fixed at 8 words (32 bytes)

Variable cache size from 1KB to 8KB
Read latency from 1 to 8 cycles

Again, the cache is checked first for instruction fetches. A hit provides the instruction in just 1 cycle. The 2-way associativity improves performance compared to a direct-mapped cache.

How the Instruction Cache Works

The instruction cache contains cached instructions in organized lines and sets. It is managed by cache policies like:

Cache line fetch – Fetches aligned blocks of instructions
Write policy – Cortex-M uses a read-only cache so writes go directly to memory

Allocation policy – Fetch and cache new instructions on a miss
Replacement policy – Replace old lines using LRU algorithm

When the CPU requests an instruction fetch, the cache is checked in parallel with looking up the physical memory address. If the cache hit occurs, the instruction is returned from the cache. Otherwise, a fixed block of instructions is fetched from memory, cached, and returned to the CPU.

The cache lines are marked as empty upon reset. Cache misses cause line fills until the working set fits inside the cache. After this warm-up period, the hit rate increases. The LRU replacement policy aims to retain the most frequently used lines in the cache.

Cache Coherency

The instruction cache remains coherent with memory using strategies like:

Write-Through – Writes go to cache and memory

Non-allocating – Fetched lines are marked invalid on writes
Non-caching buffer – Buffers writes to retire before setting cache line status

These maintain coherency by ensuring memory is updated correctly. The cache contents remain a subset of memory. Cortex-M4/M7 use a non-allocating write buffer to track writes before marking instructions invalid.

Cache Maintenance Operations

Software cache maintenance operations are provided to invalidate and flush the cache. These operations improve coherency in multiprocessor systems and other niche cases. The CPU ensures correct sequencing of cache and memory accesses around these operations.

Performance Impact

The instruction cache improves performance when:

There is instruction locality/reuse in the code

Instructions are not uniformly spread over memory
Multiple accesses to same functions/loops occur

In best cases, the instruction cache can provide near 1 cycle effective memory access time. But the hits depend heavily on the code itself. Poorly optimized code may have limited locality and low hit rates. The cache cannot improve random or non-reusing code.

Cache performance also varies based on size. A smaller cache has limited ability to retain hot code sections. Optimizing programs to fit key loops and functions into the cache is important.

When the Cache Fails

There are cases where the instruction cache provides no benefit or even worsens performance:

Code with little or no temporal/spatial instruction locality

Data dependent branches that cause constant cache misses
Thrashing when cache is too small for working set code
Cache maintenance overhead nullifies hits

For these adverse cases, the cache can be disabled globally or specific regions can be marked as non-cacheable. The penalty is higher average access time without the cache.

Instruction Cache Design Considerations

Key cache design decisions that impact performance:

Total cache size – Bigger cache retains more hot code

Line length – Longer lines reduce misses
Associativity – More ways increase hit rates
Read latency – Lower is better but uses more power

Write policy – Balancing coherence overhead

These factors determine overall hit rate, average access time, power usage, and implementation cost. Optimal configuration depends on balancing application requirements like deterministic real-time behavior, low-energy, and maximum performance.

Summary

The instruction cache in ARM Cortex-M microcontrollers is an on-chip memory that stores a subset of recently used instructions to reduce accesses to slower off-chip memories. It exploits spatial and temporal locality principles to buffer hot code sections. When programmed and utilized effectively, the instruction cache significantly increases the instruction fetch performance.

What is Instruction Cache in Arm Cortex-M series?

What is an Instruction Cache?

Why Have an Instruction Cache?

Instruction Cache in ARM Cortex-M

Cortex-M7 Instruction Cache

Cortex-M4 Instruction Cache

How the Instruction Cache Works

Cache Coherency

Cache Maintenance Operations

Performance Impact

When the Cache Fails

Instruction Cache Design Considerations

Summary

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

What is Serial Wire Viewer (SWV) in Arm Cortex-M?

Flash Patch and Breakpoint Unit (FPB) in Arm Cortex-M Explained

Arm Cortex-M DAP bus and interconnect architecture Explained

Controlling Clocks and PLL for Power Savings in Cortex-M3