SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is the cache memory in ARM processor?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is the cache memory in ARM processor?

Scott Allen
Last updated: September 11, 2023 1:46 am
Scott Allen 6 Min Read
Share
SHARE

Cache memory in ARM processors refers to small, fast memory units integrated into the processor that store frequently accessed data and instructions to improve performance. The cache sits between the CPU and main memory and serves as a temporary staging area, keeping data that the CPU will likely need again in the near future. This allows the processor to access cache data much faster compared to relatively slower main memory access.

Contents
Purpose of Cache MemoryCache Organization in ARMCache Mapping TechniquesCache Coherence ProtocolsCache Performance OptimizationConclusion

Purpose of Cache Memory

The key purpose of cache memory is to reduce the average latency of memory accesses and improve performance. Main memory in computers has continued to lag behind CPU speeds, leading to a disconnect where the processor has to stall while waiting for data from memory. Cache memory helps bridge this gap by exploiting locality of reference principles:

  • Temporal locality – Recently accessed data items are likely to be accessed again soon
  • Spatial locality – Data items with nearby addresses tend to be referenced close together in time

Cache memory stores recent and adjacent data for quick access. In addition, cache also helps avoid re-fetching duplicate instructions and data from main memory. By providing rapid access to frequently used data and instructions, cache significantly improves memory subsystem performance.

Cache Organization in ARM

ARM processors utilize multiple levels of cache in a hierarchical design to fully exploit locality. Lower level caches integrated into the CPU core are very fast but smaller in capacity. Higher cache levels are progressively larger but have longer access times. Data is moved between cache levels and main memory in cache lines (fixed size blocks).

A typical ARM cache hierarchy consists of:

  • L1 Cache – Split into separate instruction and data caches. Very low (1-3 cycle) access latency.
  • L2 Cache – Unified cache for both instructions and data. Low latency.
  • L3 Cache – Optional. For advanced multicore ARM processors. Very large capacity.
  • Main Memory – Large DRAM accessed on cache misses. High (10s-100s cycle) latency.

ARM implementsinclusive caching where cache levels subsume the contents of higher levels. So L1 cache is a subset of L2, which includes L3 and main memory data. This simplifies cache coherence between levels.

Cache Mapping Techniques

To locate data between cache and memory, ARM utilizes virtual indexing and physical tagging. The processor generates a virtual address which includes a virtual page number (VPN), offset, and tag. The VPN is sent to the Memory Management Unit (MMU) to translate to a physical page number. The offset indexes into the selected cache set. The tag holds partial physical address info used for matching on cache lookups.

ARM caches are physically indexed and physically tagged (PIPT). Since virtual addresses may map to any physical location, virtual indexing would result in aliasing issues. By using physical indexes, cache access can start in parallel with virtual to physical address translation.

ARM employs three cache mapping policies:

  • Direct mapped – Each cache block maps to only one cache set
  • Fully associative – A cache block can be placed in any cache set
  • Set associative – Compromise between direct and full associativity. Cache is divided into sets with a fixed number of blocks per set.

Set associative mapping is commonly used in ARM as it provides a good balance between hit rate, access time, and design complexity. Popular configurations are 4-way and 8-way set associative caches.

Cache Coherence Protocols

With multiple caches and cores, ARM employs coherence protocols to maintain consistency between cache copies. Coherency ensures changes in one cache are reflected in other caches to prevent reading stale data. ARM utilizes:

  • Snooping – Bus-based protocol where caches snoop on each others’ transactions. Good for smaller number of cores.
  • Directory-based – Central directories track cache line states. Scales better for more cores.

Both schemes rely on establishing ownership/exclusivity for cache lines. The owner cache has the right to modify a line while other caches only hold read-only copies. On a write miss, ownership requests go out to invalidate other copies.

Cache Performance Optimization

ARM employs various techniques to optimize cache utilization and minimize misses:

  • Write buffers – Cache write misses go into write buffers while readout continues uninterrupted from cache.
  • Load/store reordering – Scheduling load/stores out-of-order to prevent stalls.
  • Prefetching – Predicting future accesses and bringing data into cache ahead of time.
  • Way prediction – Predicting correct cache way to reduce access time.

Advanced ARM processors may also implement compression in cache to increase effective capacity and allocate cache dynamically between cores depending on workloads.

Conclusion

In summary, cache memory plays a critical role in ARM processors by exploiting locality principles and providing fast access to frequently used data. Multiple cache levels arranged hierarchically help balance access speed, hit rate, and cost. ARM employs modern caching techniques like set associativity, PIPT, snoop/directory protocols, write buffering, and prefetching to maximize performance. Caches help bridge the processor-memory performance gap and make ARM an efficient architecture for embedded and mobile designs.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Does ARM have cache memory?
Next Article How many registers does ARM Cortex-M have?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

What is the difference between ARM Cortex-M0 and M3?

The ARM Cortex-M0 and Cortex-M3 are two of the most…

7 Min Read

Stack Frame Layout During Cortex-M Interrupts

When an interrupt occurs on a Cortex-M processor, the processor…

7 Min Read

Loading Cortex-M1 soft processor on Pynq Z2 FPGA

The Cortex-M1 is a 32-bit reduced instruction set computer (RISC)…

7 Min Read

Arm Programming Software

Arm processors power technology that's transforming the world – from…

11 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account