The Cortex-A76 is Arm’s latest high-performance mobile CPU core designed for greater performance and power efficiency. In this in-depth article, we will explore the key architectural features and specifications of the Cortex-A76 to understand what makes this CPU core so powerful and efficient.
Overview
The Cortex-A76 is based on Arm’s new “Austin” microarchitecture, representing a significant performance boost over previous Cortex-A75 and A73 cores. Some of the major goals for the A76 design included:
- Delivering a 35% performance gain over Cortex-A73
- Improving power efficiency by 37% over Cortex-A73
- Enabling scalability from smartphones to laptops
- Supporting high-end mobile workloads like gaming, AI, imaging
To achieve these ambitious targets, Arm redesigned many aspects of the CPU microarchitecture, fabrication process, and platform-level optimizations. As a result, the Cortex-A76 stands out as Arm’s fastest and most efficient mobile CPU core yet.
Microarchitecture
The Cortex-A76 microarchitecture is based on Arm’s Austin core design, introducing several microarchitectural changes to boost performance and efficiency. Some of the major improvements include:
Out-of-Order Execution
The Cortex-A76 utilizes an aggressive out-of-order execution engine, allowing it to reorder and schedule instructions dynamically for maximized utilization of execution resources. This increases instruction-level parallelism (ILP) for greater throughput.
Larger Reorder Buffer
The CPU front-end has a 128-entry reorder buffer for instruction reordering, significantly larger than the 48-entry buffer in Cortex-A73. This enables more efficient out-of-order execution and prefetching to feed the execution pipelines.
Wider Execution Pipelines
The integer execution pipelines have been widened from 3-wide to 4-wide for greater parallelism. The load/store pipelines are increased from 2-wide to 3-wide. The wider pipelines allow more instructions to be processed concurrently each cycle.
Larger Branch Order Buffer
The branch order buffer size is doubled from 16-entry to 32-entry, minimizing branch mispredictions and stalls. This enhances branch prediction accuracy for complex code.
Enhanced Floating Point/NEON Unit
The vector floating point/NEON unit has been enhanced with increased execution bandwidth and lower latency for floating point ops. This benefits computational workloads.
Larger L1 Instruction Cache
The L1 instruction cache capacity is increased from 64KB to 48KB per core. This reduces instruction misses to keep the wide pipelines continuously fed with code.
Higher L2 Cache Bandwidth
The L2 cache bandwidth has doubled from Cortex-A73, providing faster access to data and instructions to support the improved execution pipelines.
Specifications
Let’s look at the key specifications of the Cortex-A76 CPU core:
- Process Technology: 7nm, 10nm, 12nm, 14nm
- Pipeline Depth: 15 stages
- Execution Units: 4-wide integer, 3-wide load/store, 2-wide branch
- Reorder Buffer Size: 128 entries
- Branch Order Buffer Size: 32 entries
- L1 Instruction Cache: 48KB per core
- L1 Data Cache: 64KB per core
- L2 Cache: 256KB – 4MB shared
- L3 Cache: 4MB – 16MB shared (in multi-core configs)
- ISA: Armv8.2-A (64-bit), Armv7-A (32-bit)
- Clock Speed: Up to 3.0 GHz on 7nm
- Cores: Up to 8 cores per cluster
The Cortex-A76 is designed to scale from smartphones and tablets to laptops by varying the core counts, cache sizes and clock frequencies. The high-end 7nm variants of the A76 can hit clock speeds up to 3GHz, while the mid-range 10nm options can go up to around 2.8GHz.
Performance
The combination of microarchitectural enhancements and fabrication process improvements result in significant performance gains for Cortex-A76 over prior Arm mobile cores:
- 35% faster than Cortex-A73
- 37% higher absolute throughput than Cortex-A75
- Up to 40% higher multi-threaded performance
- 2.7x faster machine learning thanks to AI optimizations
Real-world device benchmarks show the Cortex-A76 achieving up to 20% faster application launch times compared to previous high-end Arm mobile cores. The A76 maintains peak performance for sustained periods thanks to its dynamic voltage and frequency scaling algorithms.
Power Efficiency
In addition to delivering top-class performance, the Cortex-A76 also excels at power efficiency. The CPU core is engineered for optimized power consumption across a wide range of mobile workloads. The key power efficiency features include:
- 37% better energy efficiency versus Cortex-A73
- Variable pipeline width to improve energy efficiency during lower utilization
- Fine-grained clock gating and power gating
- Dynamic voltage and frequency scaling
- Optimized memory subsystem power
Overall, the Cortex-A76 achieves laptop-class performance within strict mobile power budgets. This enables powerful experiences on smartphones without compromising battery life.
Real-World Devices
The Cortex-A76 CPU has been adopted across high-end Android smartphones, tablets, Chromebooks and Windows laptops from leading OEMs. Some examples include:
- Huawei Mate 20, P30, Honor View 20
- Xiaomi Mi 9, Mi Mix 3, Redmi K20 Pro
- Samsung Galaxy S10, Note 10
- Realme X2 Pro
- Vivo iQOO Pro 5G
- LG G8 ThinQ
- Google Pixel 4
- Lenovo Chromebook Flex 5
- Samsung Galaxy Book S
In these devices, the Cortex-A76 is often paired with Arm’s Mali GPUs and NPUs to deliver exceptional overall performance for mobile apps and games, especially those powered by AI/ML.
Comparison with Other CPUs
Cortex-A75
The Cortex-A76 succeeds the previous generation Cortex-A75 CPU also designed by Arm. However, A76 represents a new microarchitecture with significant improvements throughout the design. Key advantages over A75 include:
- New deeper out-of-order pipeline
- Larger instruction buffers and caches
- Higher integer, floating point and memory bandwidth
- Faster, lower latency L1/L2 data cache
- Enhanced branch prediction accuracy
- Optimized power efficiency and thermal design
Thanks to these enhancements, the A76 clocks higher and executes more instructions per cycle than the A75 for much better performance.
Apple A11 Bionic
The Apple A11 Bionic powers the iPhone X and iPhone 8 series. It combines 4 performance cores along with 4 efficiency cores. The Cortex-A76 offers a few advantages over the A11 CPU performance cores:
- A76 uses newer 7nm/10nm processes versus A11’s 10nm node for better power efficiency
- Wider out-of-order execution pipeline in A76
- Higher memory bandwidth – L1/L2 caches run faster on A76
- A76 has 2X the machine learning performance
- A76 supports latest Armv8.2-A ISA whereas A11 uses Armv8.0
Overall, the Cortex-A76 brings the latest high-performance mobile CPU innovations to the Arm ecosystem.
Intel Core i7-8550U
The Intel Core i7-8550U from the Kaby Lake generation of processors is designed for mainstream laptops. Compared to this 4 core 8 thread CPU, the latest Cortex-A76 offers a competitive combination of performance and power efficiency. Some notable differences:
- A76 uses much newer 7nm process compared to Intel’s older 14nm++
- Intel CPU has higher single threaded performance
- Cortex-A76 has a more advanced out-of-order engine and scheduling
- A76 has dedicated AI/ML hardware for efficient machine learning
- Intel CPU TDP is 15W versus 5W for mobile A76 configurations
- A76 suits mobile form factors, Intel better for larger laptops
Overall, the Cortex-A76 brings laptop-class capabilities to smartphones and tablets using the latest Arm technologies.
Conclusion
The Cortex-A76 represents a significant leap forward in Arm’s mobile CPU IP, combining new Austin core architecture advances with cutting-edge 7nm/10nm fabrication for impressive gains in performance and power efficiency. Key strengths of this CPU include:
- Up to 35% faster throughput than previous Arm mobile cores
- Excellent single threaded and multi-threaded performance
- Enhanced out-of-order engine and wider execution pipelines
- Laptop-level capabilities in mobile power budgets
- Optimized low power design for long battery life
- Scales from smartphones to laptops
- Adoption in leading flagship mobile devices
In summary, the Cortex-A76 sets new benchmarks for Arm in mobile computing and AI acceleration, making it the ideal CPU for advanced experiences on phones, tablets, Chromebooks and other devices.