ARM processors are central processing units (CPUs) based on the ARM architecture developed by Arm Holdings. They are widely used in mobile devices and embedded systems due to their power efficiency and performance per watt. The key components that make up an ARM processor are the instruction set architecture, microarchitecture design, and chip implementation.
1. Instruction Set Architecture
The instruction set architecture (ISA) defines the machine language that software uses to communicate with the processor. It specifies the instructions the processor understands, the registers used to store data, memory addressing modes, data types, and more. The ARM ISA uses Reduced Instruction Set Computer (RISC) principles, which utilize simpler instructions that can be executed quickly in a short pipeline. This provides high performance with good code density and power efficiency.
Some key features of the ARM ISA include:
- 32-bit fixed length instructions
- Load/store architecture with few instructions operating directly on memory
- Conditional execution of most instructions
- Thirty-two general purpose registers
- Multiple instruction sets such as ARM, Thumb, Thumb-2, etc.
- Processor operating states to allow power saving
The ARM ISA has evolved over time while retaining backwards compatibility. For example, the Thumb instruction set was introduced to improve code density, and the Thumb-2 extension provided additional 32-bit Thumb instructions for better performance. ARMv8-A added 64-bit register support and AArch64, a new 64-bit execution state.
2. Microarchitecture Design
The microarchitecture defines how an ARM processor executes instructions and handles data. It determines aspects like the pipeline structure, cache organization, branch prediction, out-of-order execution, and more. Microarchitectural improvements allow higher performance and efficiency without changing the ARM ISA.
Some key microarchitectural features of ARM processors include:
- Pipelining – Instructions are executed in stages like fetch, decode, execute, etc. to increase throughput.
- Superscalar – Multiple instructions can be issued per clock cycle for parallel execution.
- Branch prediction – Hardware predicts direction of branches to avoid pipeline stalls.
- Speculative execution – Instructions beyond branches are executed speculatively to optimize performance.
- Out-of-order execution – Instructions execute in data-flow order vs program order to maximize throughput.
- Caching – Hierarchical caches provide fast access to frequently used data.
ARM collaborates with partners like Qualcomm, Apple, Samsung to optimize the microarchitecture for their custom processors. This allows tailoring for specific performance and power requirements.
3. Chip Implementation
The chip implementation determines how the ARM processor is physically built as an integrated circuit (IC). It covers aspects like:
- Manufacturing process – Older ARM chips used larger processes, while newer ones leverage smaller processes like 7nm for better density and efficiency.
- Core configurations – ARM CPU cores can be combined on a single chip in configurations like quad-core, hexa-core, etc.
- Cache sizes – Larger or smaller caches impact performance and chip area.
- Clock frequencies – ARM cores are often clocked from 1GHz to over 3GHz depending on the target application.
- Logic libraries – Standard cells and memories customized for the process technology.
- Physical design – Floorplanning, placement, clock distribution, routing, layout of the chip.
Leading companies like TSMC, Samsung, and GlobalFoundries manufacture ARM-based processors. The chip implementation allows extensive customization and integration of ARM CPU cores into full System-on-Chips (SoCs) with memories, graphics, AI accelerators, I/O, and more.
ARM Instruction Set Architecture
The ARM instruction set architecture provides stability and a common foundation across the diverse range of ARM processors. It enables developers to write software that runs on different ARM chips. The key aspects of the ISA are:
- Registers – 32 general purpose 32-bit registers R0-R15. R13-R15 have special roles as stack pointer, link register, and program counter.
- Data types – Supports 32-bit and 64-bit data types including signed and unsigned integers, floats, doubles.
- Addressing modes – Supports flexible modes like immediate, register, scaled register, pre/post-indexed for memory access.
- ALU instructions – Arithmetic, logical, and comparison instructions for integer or floating point data.
- Flow control – Branch (B), Branch with Link (BL) instructions for changes in program flow.
- Load/Store – LDR, STR instructions to load/save data between registers and memory.
The ARM ISA provides a solid base to develop software – from operating systems like Linux to applications. Compatibility allows software reuse across different ARM processors.
ARM Processor Pipeline
Pipelining is a key technique used in ARM processors to increase instruction throughput. It works by splitting instruction execution into multiple stages, allowing multiple instructions to be worked on concurrently. A simplified 5-stage pipeline in an ARM CPU consists of:
- Fetch – Fetch instruction from memory.
- Decode – Figure out instruction type and operands.
- Execute – Perform actual operation like add or multiply.
- Memory – Load/store data from memory for load/store instructions.
- Write Back – Write execution results to registers.
While one instruction is being executed, the next instruction can be simultaneously decoded, and another fetched from memory. This assembly line-like arrangement increases instruction throughput by up to 5x for a 5-stage pipeline. Techniques like branch prediction and speculative execution help minimize pipeline stalls.
ARM Processor Cores
ARM CPU cores are designed to deliver high performance within constrained power budgets. Some popular ARM cores include:
- Cortex-A78 – High performance core for smartphones, servers.
- Cortex-A76 – Previous generation smartphone/server core.
- Cortex-A55 – Little core for power efficiency in big.LITTLE.
- Cortex-A35 – Ultra power efficient 32-bit core.
- Cortex-R52 – Real-time core for safety-critical applications.
- Cortex-M55 – Feature rich 32-bit MCU core for IoT.
These cores support features like superscalar execution, out-of-order processing, large caches, and multicore configurations. ARM’s big.LITTLE approach pairs energy efficient cores with high performance cores to target varied workloads.
ARM Processor Instruction Sets
ARM processors support multiple instruction sets that offer tradeoffs between performance, code density, and complexity:
- ARM – Original 32-bit ISA with good performance.
- Thumb – Compact 16-bit instruction set for better density.
- Thumb-2 – Extension to combine 16-bit and 32-bit Thumb instructions.
- AArch64/ARMv8 – 64-bit ISA for enhanced addressing and data.
- NEON – SIMD instructions for multimedia/DSP tasks.
- VFP – Floating point coprocessor instructions for math intensive applications.
Developers can choose the optimal instruction set profile to build software applications. Compatibility across the instruction sets allows flexibility and reuse.
ARM Processor Cache
To bridge the speed gap between fast processors and slower memories, ARM processors incorporate caches that store frequently accessed data and instructions:
- L1 Cache – Small, low latency cache near the processor core (16-64KB). Split between separate instruction and data caches.
- L2 Cache – Bigger cache with higher latency. Often 256KB – 2MB in size.
- L3 Cache – Optional large cache up to 8MB to reduce accesses to main memory.
Sophisticated cache policies like set-associative caches, write-back caches, and cache coherence in multicore chips help maximize performance. The cache hierarchy greatly improves average memory access time.
ARM Processor Comparisons
Here are some key differences between various ARM-based processors:
- Cortex-A78 vs Cortex-A77 – Improved performance (10-15%) and power efficiency (5%) on 7nm.
- Cortex-A77 vs Cortex-A76 – Higher clocks, larger out-of-order windows for better performance.
- Cortex-A55 vs Cortex-A53 – Redesigned microarchitecture for 30% better efficiency.
- Cortex-M4 vs Cortex-M3 – Adds single precision float, optional MPU, improved energy efficiency.
- ARMv8 vs ARMv7 – 64-bit support, new instruction sets like AArch64, SVE.
While retaining the ARM architecture, successive processor generations deliver improved efficiency and capabilities.
Conclusion
ARM processors power over 90% of smartphones and a growing share of embedded devices. The key ingredients underlying their success are the energy efficient RISC architecture, microarchitectural innovations like advanced pipelines and multi-issue cores, and rapid customization through licensable IP cores. With ARM processors now making headway into laptops, servers and even supercomputers, their low power heritage coupled with continuous performance improvements make them processors of choice for the future across diverse computing segments.