Qualcomm has made several customizations to the ARM Cortex-A76 CPU core in their Snapdragon system-on-chips (SOCs) to optimize performance and efficiency. The key changes include modifying the microarchitecture, increasing clock speeds, and adding custom instructions. Understanding Qualcomm’s customizations provides insight into the unique capabilities and advantages of Snapdragon SOCs.
Overview of Cortex-A76
The Cortex-A76 is ARM’s first 7nm CPU core designed for mobile devices. It uses the ARMv8.2-A instruction set and features:
- Out-of-order execution for high performance
- DynamIQ technology for heterogeneous computing
- 4-wide decode and dispatch for efficient instruction throughput
- Large physical register files for speculative execution
- Enhanced branch prediction with a deeper TAGE branch predictor
Key goals for Cortex-A76 were improving performance and energy efficiency. On TSMC 7nm, ARM targeted 25% higher performance or 40% better power efficiency compared to Cortex-A75.
Microarchitecture Changes in Snapdragon Cortex-A76
While Qualcomm uses the Cortex-A76 in Snapdragon, they modify the microarchitecture for further optimizations. These changes are designed to reduce latency, increase throughput, improve branch prediction accuracy, and enhance sustained performance. Some of the modifications include:
- Increasing issue queue size – Allows more instruction reordering and hides latency
- Larger load/store buffers – Prevents stalls when accessing memory
- Extra branch target buffers – Improves branch prediction accuracy
- Deeper out-of-order queues – Enables better instruction scheduling
Qualcomm also tweaks the prefetchers, cache organization, and coherency mechanisms. While they do not disclose full microarchitectural details, these changes aim to reduce delays in the pipeline. The customizations vary between Snapdragon generations. For example, Snapdragon 855 features more extensive changes compared to Snapdragon 845. However, the goals remain similar – lower latency, higher throughput, and sustained performance.
Higher Clocks Speeds
In addition to microarchitectural optimizations, Qualcomm increases the clock speeds of Cortex-A76 cores in Snapdragon SOCs. For example:
- Snapdragon 855 runs up to 2.84 GHz
- Snapdragon 865 runs up to 2.84 GHz
- Snapdragon 888 runs up to 2.84 GHz
This is significantly higher than the reference 2.6 GHz speed of Cortex-A76. The increased clocks allow Snapdragon SOCs to extract more performance from the customized cores. Qualcomm leverages TSMC’s advanced 7nm and 5nm processes to push higher frequencies while maintaining efficiency. Process refinements like EUV lithography enable lower voltage operation at high clocks.
Besides microarchitecture and clocks, Qualcomm adds custom instructions to Snapdragon Cortex-A76 implementations. These instructions accelerate specialized tasks in areas like AI, graphics, image processing, and security. Some examples include:
- dotprod – Performs dot product operations used in neural networks
- int8matrix – Processes 8-bit matrix operations for AI
- vpsel – Vector predicate select instruction
- SHA3 – Accelerates SHA3 hash calculations for security
By supporting these operations directly in hardware, Qualcomm aims to improve performance and efficiency. The custom instructions allow Snapdragon SOCs to execute more workloads on-device without relying on external accelerators. Qualcomm optimizes the Cortex-A76 pipeline to lower the latency of custom instructions. They also add custom execution units to increase throughput for specialized tasks. These enhancements allow custom instructions to achieve both high performance and low power consumption.
Impact on CPU Performance
Collectively, Qualcomm’s optimizations to Cortex-A76 result in significant CPU performance uplifts in Snapdragon SOCs. Some examples include:
- 20% higher integer scores in Snapdragon 855 vs 845
- 25% faster AI processing in Snapdragon 865 vs 855
- 10% faster web browsing in Snapdragon 888 vs 865
These gains come from the compound benefits of microarchitecture changes, higher clocks, and custom instructions. While microarchitecture tweaks reduce latency, higher frequencies increase throughput. Custom instructions also boost performance for key workloads. Qualcomm targets a balance of single-threaded and multi-threaded performance. This enables Snapdragon SOCs to excel at both task switching and parallel processing. Gaming, in particular, benefits from sustained single-thread throughput. The customizations also improve energy efficiency. Snapdragon SOCs with modified Cortex-A76 cores can deliver more performance per watt. This allows for powerful user experiences with long battery life in mobile devices.
Comparison to Other CPU Cores
Compared to other mobile CPU cores, Qualcomm’s custom Cortex-A76 delivers top-tier performance in Snapdragon SOCs. For example:
- 15% higher single-thread than Cortex-A77 in Geekbench
- 25% faster multi-thread than Apple A13 Bionic
- 1.25x higher AI throughput than Huawei Kirin 990
Qualcomm maintains a lead by staying at the leading edge of process technology and rapid iteration of microarchitecture. Each Snapdragon generation brings refinements to Cortex-A76 that widen the performance gap. However, competitors like Samsung and MediaTek also customize ARM cores for improved performance. The microarchitecture optimizations they utilize may differ from Qualcomm’s approach. As process nodes advance, they continue optimizing ARM cores to close the gap. But Qualcomm’s customization efforts appear sufficient to maintain a performance advantage in Snapdragon SOCs for now. The combination of process, microarchitecture, clocks, and instructions gives them top-tier mobile CPU performance.
Challenges and Limitations
While overall effective, Qualcomm’s customizations also carry some challenges and limitations including:
- Increased design time and costs – Customizing ARM cores requires more engineering effort
- Platform compatibility risks – Heavily modified cores may cause software issues
- Diminishing returns – Gains become smaller with each generation
- Complex validation – More extensive testing needed for changes
There are also risks associated with pushing frequency and voltage limits on advanced process nodes. Higher defect densities at 7nm/5nm could impact yields. Additionally, power consumption may increase faster than performance gains. There is a practical limit to how much Cortex-A76 can be optimized on future nodes. Qualcomm may need to design larger modifications or adopt newer ARM CPU architectures like Cortex-X and Cortex-A710 to achieve major performance improvements going forward.
Future Snapdragon Cortex-A76 Outlook
Looking ahead, Qualcomm will likely continue customizing Cortex-A76 cores in upcoming Snapdragon SOCs. But Cortex-A76 is expected to be replaced by newer ARM CPU architectures in future Snapdragon generations. Some potential changes include:
- Adopting Cortex-X custom cores on 4nm/3nm processes
- Transitioning to ARMv9 and Cortex-A710 on 3nm process
- Increasing core counts up to four performance cores
- Adding more cache memory for higher performance
Qualcomm may also integrate next-generation Nuvia CPU cores they acquired in 2021. The Nuvia team has extensive experience designing high-performance ARM-based CPUs. But in the short term, expect minor modifications to Cortex-A76 in upcoming Snapdragon SOCs. More extensive changes will likely wait for newer process nodes and ARM CPU architectures.
In summary, Qualcomm customizes ARM’s Cortex-A76 CPU core in Snapdragon SOCs for significant performance and efficiency gains. Through microarchitecture tweaks, higher clock speeds, and custom instructions, they are able to push the limits of ARM CPU performance on mobile platforms. The customizations provide tangible benefits today, though diminishing returns may motivate a transition to next-generation CPU cores in the future. Nevertheless, Qualcomm’s custom Cortex-A76 enables Snapdragon SOCs to deliver top-tier CPU and AI processing, showcasing the advantages of custom ARM core optimization.