Yes, ARM processors do have cache memory. Cache memory is a small, fast memory located close to the processor core that stores frequently accessed data and instructions to speed up processing. ARM processors typically have multiple levels of cache memory:
Level 1 Cache
ARM processors have split level 1 (L1) caches – one for instructions and one for data. The L1 instruction and data caches are located right next to the processor core for very fast access. Typical sizes for L1 caches in ARM processors are:
- Instruction cache: 16-64 KB
- Data cache: 16-64 KB
The L1 caches help improve performance by reducing the number of slower accesses to main memory. When the processor needs data or an instruction, it first checks the L1 cache. If the required information is not found (a “cache miss”), it then looks in lower level caches or main memory, which takes longer. If the information is in the L1 cache (a “cache hit”), the processor gets it much faster without waiting for main memory.
Level 2 Cache
Most ARM processors also have a level 2 (L2) cache. The L2 cache is larger than the L1 caches, but is farther away from the processor core. Typical L2 cache sizes for ARM processors are:
- 128 KB – 8 MB
The processor checks the L2 cache if the required data or instruction causes an L1 cache miss. If found in the L2, access is still much faster than main memory. The L2 helps reduce accesses to main memory even further for improved performance.
Cache Organization
ARM caches are usually 4-way set associative. This means the cache is divided into 4 “ways” and each address can be stored in one of the 4 ways in a set. This provides more flexibility and improves the hit rate compared to a direct mapped cache.
ARM caches are virtually indexed and physically tagged (VIPT). The cache index is based on the virtual address, but the tag contains the physical address. This simplifies cache lookup and improves performance.
Cache Coherency
In multicore ARM processors, the L1 caches are coherent. This means the cores have a consistent view of data across the L1 caches. When a core modifies data, other cores see this updated value, not a stale copy from their local cache. There are different algorithms ARM uses for cache coherency such as MESI and MOESI.
Cache Replacement Policies
When cache misses occur, ARM processors need to choose a cache line to evict and replace with the newly required data. Common cache replacement policies used are:
- Least Recently Used (LRU) – Evicts the line that was least recently accessed
- Pseudo-LRU (PLRU) – A lower cost approximation of full LRU
- Random – Evicts a random line
Advanced policies like dynamic insertion policy are also used to optimize replacement decisions.
Cache Write Policies
ARM caches also use write policies to handle writes to cache lines. The options are:
- Write-through – Data is written to cache and main memory
- Write-back – Data only written to cache, written to memory later when line is replaced
- Write-around – Data written directly to memory, not cached
Write-back is commonly used as it reduces memory traffic, but write-through or write-around may be used in certain situations.
Caching Modes
ARM processors support different caching modes to allow flexibility. Some examples are:
- Cache enabled – Normal operation with cache hits and misses
- Cache disabled – Cache is turned off, no caching performed
- Bypass cache – Cache is on but skipped, memory accesses go directly to main memory
- Write-through – All writes go to cache and memory regardless of policy
The ability to configure cache modes is useful for real-time applications, shared memory, and other special cases.
Cache Performance
As a rough guide, typical cache hit latencies and impact on performance in ARM processors are:
- L1 cache: 1-3 clock cycles – Very fast access
- L2 cache: 10-20 clock cycles – Faster than memory
- Main memory: >100 clock cycles – Slowest, limits performance
The hit rate, or percentage of accesses that are cache hits, also significantly affects overall performance. A higher hit rate means less waiting on main memory accesses.
Caching Challenges
Some challenges that ARM and other processors with caching face include:
- Cache contention in multicore – Cores compete for cache space
- Cache coherence overhead – Maintaining coherent view has overhead
- Cache thrashing – Useful data gets evicted before reuse
- Cache conflicts – Different addresses compete to use cache
Techniques like smarter cache allocation, replacing policies, and data migration help address these challenges.
Other Cache Features
Some other cache-related features supported by ARM processors include:
- Prefetching – Predicting and loading data before use
- Data streaming – Special handling of sequential data
- Lockdown – Locking critical instructions/data in cache
- Parity or ECC – Error detection/correction
These enhance cache performance, predictability, reliability, and real-time determinism.
Conclusion
In summary, ARM processors utilize multiple levels of cache memory like L1 and L2 caches to significantly improve performance compared to accessing main memory for every operation. Cache organization, replacement policies, coherence and other optimizations help ARM achieve effective caching. Caches provide major performance benefits but also introduce design and optimization complexity.