ARM processors have a large number of registers compared to many other processor architectures. The main reasons for this design choice are to enable more efficient execution of compiled code and to provide flexibility for a wide range of use cases.
Compiled Code Efficiency
Having more registers available allows compiled code to keep more values in registers rather than having to spill them to memory. Since accessing registers is much faster than accessing main memory, this improves performance. Some key ways the abundant registers help include:
- More registers means more intermediate values can be kept in registers during complex calculations, reducing memory accesses.
- More registers makes it easier for the compiler to allocate registers to variables that are frequently accessed, keeping them in fast registers rather than slow memory.
- More registers allows more compiler optimizations like register renaming to reduce pipeline stalls.
- Leaf functions have more registers available for local variables before needing to spill to the stack.
The ARM Application Profile has 16 general purpose 32-bit registers visible to the programmer (R0-R15). This is substantially more than architectures like x86 which only have 8 or so general purpose registers. The ARM registers are also more general purpose than x86, which has specialized registers for certain uses like the stack pointer and base pointer.
Flexibility for Different Use Cases
ARM processors are designed to be flexible enough for a wide range of devices and use cases. The abundant registers help enable this flexibility:
- Operating systems may bank switch to quickly save/restore register state when switching between processes.
- Interrupts and exceptions have dedicated banked registers for fast, low overhead context switching.
- Debugging and tracing use dedicated registers to reduce overhead.
- Media and DSP applications use SIMD registers for parallel calculations.
- Floating point intensive code has dedicated floating point registers.
Without the available registers, it would be much more difficult to provide features like advanced multi-tasking, debug support, and specialized processing in a clean efficient way. The registers help keep the different needs isolated and provide headroom for expansions.
Register Windows
One technique ARM uses to expand the effective number of registers is register windows. This involves having multiple overlapping register banks, and rotating between them as functions are called and return. For example, Bank 0 may be used for function A, Bank 1 for functions A calls, Bank 2 for functions called by B, etc.
This creates a much larger pool of registers visible across various nested function calls. However, at any given point the function only sees a subset of registers available. This register windowing technique requires minimal overhead to rotate banks on function calls/returns.
In addition to expanding the registers, this also provides automatic “spilling” to memory. If a branch goes too deep, it will spill the lowest registers to a backing store in memory. This elegantly avoids having to manually spill registers to the stack while allowing very deep call chains with fast register access.
Tradeoffs vs Larger Registers
While ARM does have an abundance of registers, the registers themselves are fairly small at 32 bits. Alternative architectures like RISC-V use fewer, but larger 64 bit registers. There are tradeoffs to each approach:
- More 32-bit registers provide better access to narrow data like characters, bytes, and 16-bit words, which would need to be packed/unpacked in larger registers.
- 32-bit registers keep ARM pipeline simpler and smaller vs 64-bit.
- Larger registers reduce pressure on register allocators and need to spill/reload registers to memory.
ARM decided to optimize for having more small registers to better suit more mobile-focused applications and embedded use cases. The large pool of 32-bit registers suits computational workloads without the overhead of 64-bit registers.
Calling Convention Registers
The ARM calling convention utilizes the abundance of registers to define various specialized registers for efficient function calls:
- R0-R3 are used for passing the first arguments into a function.
- R0-R1 are used for returning results from a function.
- R13 is the stack pointer.
- R14 is the link register storing the return address.
- R15 is the program counter.
This allows efficient passing of multiple arguments to functions and getting results back without relying on the stack. The specialized purposes of R13-R15 are enabled by having so many general purpose registers available.
Performance Benefits
The benefits of additional registers can clearly be seen in benchmark performance comparisons between ARM and x86. Even when clocked significantly slower, ARM cores often exceed the performance of x86 cores with fewer registers in mobile-focused benchmarks.
The ARM architecture was designed from the start to be a highly efficient RISC processor. The plentiful registers are a key aspect enabling ARM to deliver excellent performance while maintaining a clean RISC design. While x86 has added more registers over time, ARM benefited from focusing on register abundance from the beginning.
Conclusion
In summary, ARM processors have an abundance of registers to enable efficient compiled code, flexibility for a wide range of use cases, and strong benchmark performance. The availability of many fast general purpose registers allows the ARM calling convention, register windows, and compiler optimizations that would not be possible with fewer registers. While there are always tradeoffs, ARM’s choice to prioritize plentiful small registers has proven to be a highly beneficial design decision time and time again.