ARM FPU Instruction Set

The ARM Floating Point Unit (FPU) provides hardware support for calculations using floating point numbers. The FPU instruction set allows ARM processors to perform mathematical operations efficiently on single precision and double precision floating point values.

Contents

Overview of ARM FPU FPU Data Types FPU Instructions Data Transfer Arithmetic Comparison Conversion Status and Control Programming with the FPU ARM FPU Architectures VFP (Vector Floating Point)VFPv2 VFPv3 / VFPv4 FPv5 Summary

Overview of ARM FPU

The ARM FPU is an optional extension to the ARM instruction set architecture. It provides hardware acceleration for floating point arithmetic, which improves performance compared to doing the computations in software. The FPU operates concurrently with the ARM integer processing pipeline, allowing floating point and integer instructions to execute simultaneously.

There have been several generations of ARM FPU designs over the years. Early implementations focused on single precision (32-bit) floating point, while more recent versions also include double precision (64-bit) capabilities:

VFP (Vector Floating Point) – Single precision only
VFPv2 – Single and double precision
VFPv3 – Enhanced version of VFPv2

VFPv4 – Further improvements, ARMv7 architecture
FPv5 – Latest implementation, ARMv8 architecture

The FPU registers are separate from the ARM general purpose registers. There are 32 single precision registers (s0-s31) and 32 double precision registers (d0-d31) in a standard VFP implementation. Registers s0-s15 overlay d0-d15 for improved performance when mixing single and double precision code.

FPU Data Types

The ARM FPU supports the following floating point data types:

Single precision (32-bit) – Uses the IEEE 754 single precision format. Occupies one FPU register.
Double precision (64-bit) – Uses the IEEE 754 double precision format. Occupies two FPU registers.

Floating point values are stored in the FPU registers in a modular format composed of:

Sign bit – 1 bit determining positive or negative value.
Exponent – 8 bits representing the exponent offset by a bias.

Mantissa – 23 bits of precision for single precision, 52 bits for double.

This optimized format allows a wide range of values to be represented efficiently in the FPU registers.

FPU Instructions

The ARM FPU instructions can be grouped into several categories:

Data Transfer

Move data between FPU and ARM registers:

FLDMX – Load FPU multiple registers from memory
FSTMX – Store FPU multiple registers to memory

FMRX – Move ARM register to FPU register
FMRX – Move FPU register to ARM register

Arithmetic

Basic arithmetic operations:

FADD – Floating point add
FSUB – Floating point subtract
FMUL – Floating point multiply

FDIV – Floating point divide
FSQRT – Floating point square root

Comparison

Compare floating point values:

FCMP – Floating point compare
FCMPE – Floating point compare with exception
FCMPZ – Floating point compare with zero

FCMPEZ – Floating point compare with zero and exception

These set status flags that can be tested by conditional instructions.

Conversion

Convert between data types:

FTOSI – Floating point to signed integer
FTOUI – Floating point to unsigned integer
FSITO – Signed integer to floating point

FUITO – Unsigned integer to floating point
FTOSID – Floating point to signed integer with rounding
FTOUID – Floating point to unsigned integer with rounding

Status and Control

Manage FPU status flags and control modes:

FMXR – Move FPU flags to general purpose register
FMRX – Move general purpose register to FPU flags

FMSR – Move FPU status register to general purpose register
FMRS – Move general purpose register to FPU status register

Programming with the FPU

Here are some key aspects to keep in mind when coding with the ARM FPU:

The FPU can operate in parallel with the integer pipeline for optimal performance.
Plan data transfers to minimize stalls – load data before it is needed.
Maximize throughput by scheduling FPU and integer instructions together.

Pay attention to data dependencies and pipeline stalls.
Use FPU-specific status flags to optimize conditional code.
Enable flush-to-zero and default NaN modes for optimized computations.

Allocate variables to appropriate precision to balance performance and precision.

Proper use of the FPU can provide huge performance gains for floating point intensive code. Applications such as 3D graphics, scientific computing, statistics, and digital signal processing benefit greatly from hardware accelerated floating point arithmetic.

ARM FPU Architectures

There have been several generations of ARM FPU implementations over time. Key enhancements include:

VFP (Vector Floating Point)

Initial ARM FPU design introduced in ARMv5 architecture.
Provided basic single precision floating point support.
32 x 32-bit single precision registers.

Pipelined for high throughput.
Included in some Cortex-A series processors.

VFPv2

Introduced in ARMv6 architecture.

Added double precision capabilities.
32 x 32-bit single precision registers.
32 x 64-bit double precision registers.

Improved pipelining and multi-processing.

VFPv3 / VFPv4

Evolutionary improvements over VFPv2.
Faster context switching and register access.

Enhanced SIMD processing with 32 doubleword registers.
More execution units for higher throughput.
Included in Cortex-A5 and newer processor cores.

FPv5

Latest FPU in ARMv8 64-bit architecture.
Fully IEEE 754-2008 compliant.
Improved performance for scalar and SIMD code.

Cryptography extensions.
In Cortex-A35, A53, A55 and newer 64-bit cores.

Each FPU generation expanded the capabilities and performance of floating point computation on ARM chips. The evolution continues as ARM adds new instructions and capabilities to support emerging workloads.

Summary

The ARM floating point unit provides hardware acceleration for mathematical calculations using single and double precision floating point values. Its specialized FPU registers and pipelined execution improve performance substantially over integer only implementations. Proper utilization of the FPU instruction set and data types can greatly speed up code involving complex math, 3D graphics, signal processing, and scientific computations.

ARM FPU Instruction Set

Overview of ARM FPU

FPU Data Types

FPU Instructions

Data Transfer

Arithmetic

Comparison

Conversion

Status and Control

Programming with the FPU

ARM FPU Architectures

VFP (Vector Floating Point)

VFPv2

VFPv3 / VFPv4

FPv5

Summary

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Why is there rotate right but not rotate left instruction in cortex m3?

Use the same ISR for multiple interrupt sources in Cortex M0+

Cortex-M0 SysTick Timer

Which compiler is used for the ARM Cortex-M processors?