SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: Qualcomm customizations of Cortex-A76 in Snapdragon SOCs
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

Qualcomm customizations of Cortex-A76 in Snapdragon SOCs

Jamie Kellett
Last updated: October 31, 2023 8:36 am
Jamie Kellett 8 Min Read
Share
SHARE

Qualcomm has made several customizations to the ARM Cortex-A76 CPU core in their Snapdragon system-on-chips (SOCs) to optimize performance and efficiency. The key changes include modifying the microarchitecture, increasing clock speeds, and adding custom instructions. Understanding Qualcomm’s customizations provides insight into the unique capabilities and advantages of Snapdragon SOCs.

Contents
Overview of Cortex-A76Microarchitecture Changes in Snapdragon Cortex-A76Higher Clocks SpeedsCustom InstructionsImpact on CPU PerformanceComparison to Other CPU CoresChallenges and LimitationsFuture Snapdragon Cortex-A76 OutlookConclusion

Overview of Cortex-A76

The Cortex-A76 is ARM’s first 7nm CPU core designed for mobile devices. It uses the ARMv8.2-A instruction set and features:

  • Out-of-order execution for high performance
  • DynamIQ technology for heterogeneous computing
  • 4-wide decode and dispatch for efficient instruction throughput
  • Large physical register files for speculative execution
  • Enhanced branch prediction with a deeper TAGE branch predictor

Key goals for Cortex-A76 were improving performance and energy efficiency. On TSMC 7nm, ARM targeted 25% higher performance or 40% better power efficiency compared to Cortex-A75.

Microarchitecture Changes in Snapdragon Cortex-A76

While Qualcomm uses the Cortex-A76 in Snapdragon, they modify the microarchitecture for further optimizations. These changes are designed to reduce latency, increase throughput, improve branch prediction accuracy, and enhance sustained performance. Some of the modifications include:

  • Increasing issue queue size – Allows more instruction reordering and hides latency
  • Larger load/store buffers – Prevents stalls when accessing memory
  • Extra branch target buffers – Improves branch prediction accuracy
  • Deeper out-of-order queues – Enables better instruction scheduling

Qualcomm also tweaks the prefetchers, cache organization, and coherency mechanisms. While they do not disclose full microarchitectural details, these changes aim to reduce delays in the pipeline. The customizations vary between Snapdragon generations. For example, Snapdragon 855 features more extensive changes compared to Snapdragon 845. However, the goals remain similar – lower latency, higher throughput, and sustained performance.

Higher Clocks Speeds

In addition to microarchitectural optimizations, Qualcomm increases the clock speeds of Cortex-A76 cores in Snapdragon SOCs. For example:

  • Snapdragon 855 runs up to 2.84 GHz
  • Snapdragon 865 runs up to 2.84 GHz
  • Snapdragon 888 runs up to 2.84 GHz

This is significantly higher than the reference 2.6 GHz speed of Cortex-A76. The increased clocks allow Snapdragon SOCs to extract more performance from the customized cores. Qualcomm leverages TSMC’s advanced 7nm and 5nm processes to push higher frequencies while maintaining efficiency. Process refinements like EUV lithography enable lower voltage operation at high clocks.

Custom Instructions

Besides microarchitecture and clocks, Qualcomm adds custom instructions to Snapdragon Cortex-A76 implementations. These instructions accelerate specialized tasks in areas like AI, graphics, image processing, and security. Some examples include:

  • dotprod – Performs dot product operations used in neural networks
  • int8matrix – Processes 8-bit matrix operations for AI
  • vpsel – Vector predicate select instruction
  • SHA3 – Accelerates SHA3 hash calculations for security

By supporting these operations directly in hardware, Qualcomm aims to improve performance and efficiency. The custom instructions allow Snapdragon SOCs to execute more workloads on-device without relying on external accelerators. Qualcomm optimizes the Cortex-A76 pipeline to lower the latency of custom instructions. They also add custom execution units to increase throughput for specialized tasks. These enhancements allow custom instructions to achieve both high performance and low power consumption.

Impact on CPU Performance

Collectively, Qualcomm’s optimizations to Cortex-A76 result in significant CPU performance uplifts in Snapdragon SOCs. Some examples include:

  • 20% higher integer scores in Snapdragon 855 vs 845
  • 25% faster AI processing in Snapdragon 865 vs 855
  • 10% faster web browsing in Snapdragon 888 vs 865

These gains come from the compound benefits of microarchitecture changes, higher clocks, and custom instructions. While microarchitecture tweaks reduce latency, higher frequencies increase throughput. Custom instructions also boost performance for key workloads. Qualcomm targets a balance of single-threaded and multi-threaded performance. This enables Snapdragon SOCs to excel at both task switching and parallel processing. Gaming, in particular, benefits from sustained single-thread throughput. The customizations also improve energy efficiency. Snapdragon SOCs with modified Cortex-A76 cores can deliver more performance per watt. This allows for powerful user experiences with long battery life in mobile devices.

Comparison to Other CPU Cores

Compared to other mobile CPU cores, Qualcomm’s custom Cortex-A76 delivers top-tier performance in Snapdragon SOCs. For example:

  • 15% higher single-thread than Cortex-A77 in Geekbench
  • 25% faster multi-thread than Apple A13 Bionic
  • 1.25x higher AI throughput than Huawei Kirin 990

Qualcomm maintains a lead by staying at the leading edge of process technology and rapid iteration of microarchitecture. Each Snapdragon generation brings refinements to Cortex-A76 that widen the performance gap. However, competitors like Samsung and MediaTek also customize ARM cores for improved performance. The microarchitecture optimizations they utilize may differ from Qualcomm’s approach. As process nodes advance, they continue optimizing ARM cores to close the gap. But Qualcomm’s customization efforts appear sufficient to maintain a performance advantage in Snapdragon SOCs for now. The combination of process, microarchitecture, clocks, and instructions gives them top-tier mobile CPU performance.

Challenges and Limitations

While overall effective, Qualcomm’s customizations also carry some challenges and limitations including:

  • Increased design time and costs – Customizing ARM cores requires more engineering effort
  • Platform compatibility risks – Heavily modified cores may cause software issues
  • Diminishing returns – Gains become smaller with each generation
  • Complex validation – More extensive testing needed for changes

There are also risks associated with pushing frequency and voltage limits on advanced process nodes. Higher defect densities at 7nm/5nm could impact yields. Additionally, power consumption may increase faster than performance gains. There is a practical limit to how much Cortex-A76 can be optimized on future nodes. Qualcomm may need to design larger modifications or adopt newer ARM CPU architectures like Cortex-X and Cortex-A710 to achieve major performance improvements going forward.

Future Snapdragon Cortex-A76 Outlook

Looking ahead, Qualcomm will likely continue customizing Cortex-A76 cores in upcoming Snapdragon SOCs. But Cortex-A76 is expected to be replaced by newer ARM CPU architectures in future Snapdragon generations. Some potential changes include:

  • Adopting Cortex-X custom cores on 4nm/3nm processes
  • Transitioning to ARMv9 and Cortex-A710 on 3nm process
  • Increasing core counts up to four performance cores
  • Adding more cache memory for higher performance

Qualcomm may also integrate next-generation Nuvia CPU cores they acquired in 2021. The Nuvia team has extensive experience designing high-performance ARM-based CPUs. But in the short term, expect minor modifications to Cortex-A76 in upcoming Snapdragon SOCs. More extensive changes will likely wait for newer process nodes and ARM CPU architectures.

Conclusion

In summary, Qualcomm customizes ARM’s Cortex-A76 CPU core in Snapdragon SOCs for significant performance and efficiency gains. Through microarchitecture tweaks, higher clock speeds, and custom instructions, they are able to push the limits of ARM CPU performance on mobile platforms. The customizations provide tangible benefits today, though diminishing returns may motivate a transition to next-generation CPU cores in the future. Nevertheless, Qualcomm’s custom Cortex-A76 enables Snapdragon SOCs to deliver top-tier CPU and AI processing, showcasing the advantages of custom ARM core optimization.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article What is the difference between Cortex-A75 and A76?
Next Article How fast is Cortex-A76?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Printf Retargetting for Nano-Specs Cortex M0

The Cortex-M0 is an ultra low power 32-bit ARM Cortex-M…

6 Min Read

Cortex-M1 address translation when accessing PS DDR memory

The Cortex-M1 processor implements a Memory Protection Unit (MPU) to…

6 Min Read

How much memory does the Cortex-M85 have?

The Cortex-M85 is an ARM processor targeted for advanced driver…

6 Min Read

Stack Frame Layout During Cortex-M Interrupts

When an interrupt occurs on a Cortex-M processor, the processor…

7 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account