The Arm Cortex-M33 processor features a single precision floating point unit (FPU) that supports the IEEE 754 standard for floating point arithmetic. This FPU is optimized for power efficiency and high performance in embedded applications requiring floating point capabilities.
Overview of the Cortex-M33 FPU
The key features of the Cortex-M33 FPU include:
- Full single precision floating point unit compliant with IEEE 754 standard
- Supports single precision data types: 32-bit floats
- Floating point registers: 32 x 32-bit registers for floating point data
- Floating point status and control register
- Instructions for floating point arithmetic, comparison, conversion, and data movement
- Pipelined architecture capable of one single precision operation per clock cycle
- Integrated with CPU pipeline to minimize stalls during floating point operations
- Configurable priority between exception handling and floating point instructions
- Support for NaN propagation and subnormal numbers
- Optional Floating Point Extension for specialized trigonometric functions
This built-in FPU enables the Cortex-M33 processor to efficiently execute algorithms and functions requiring floating point math. Single precision operations can execute concurrently with integer instructions to maximize overall performance.
Floating Point Data Types
The Cortex-M33 FPU works with the following IEEE 754-compliant single precision floating point data types:
- Float (32 bit) – This is a 32 bit single precision floating point number with 8 bits for the exponent, 23 bits for the mantissa, and 1 bit for the sign.
All floating point operands, registers, and results are stored and calculated using the single precision 32-bit float format. The FPU does not support double precision (64-bit) floats.
Floating Point Registers
The Cortex-M33 FPU contains 32 dedicated 32-bit single precision floating point registers to hold operands and results for floating point instructions. These registers are distinct from the core integer registers.
The floating point registers are named S0 – S31. They can be accessed individually or in pairs to hold 64-bit values. The floating point register file is designed for simultaneous read and write access to maximize floating point performance.
Floating Point Instructions
The Cortex-M33 instruction set includes floating point instructions for:
- Arithmetic Operations – Add, subtract, multiply, divide, square root, etc.
- Comparisons – Equality, less than, greater than, etc.
- Conversions – Between integer and floating point values
- Data Movement – Load, store, transfer between FP and integer registers
These instructions operate on the single precision floating point data types and registers. The FPU is pipelined and can execute most floating point instructions in a single cycle with no stalls.
FPU Architecture and Pipeline
The Cortex-M33 FPU uses a dedicated pipelined architecture to maximize floating point performance. While the pipeline depth is not published, it likely contains 3-4 stages similar to most modern FPU designs.
The key elements of the FPU architecture include:
- Pre-Fetch – Fetch the floating point instruction from memory
- Decode – Decode the instruction and read register operands
- Execute – Perform the floating point operation
- Write Back – Write the result back to the floating point register file
Results from earlier pipeline stages are forwarded to later stages to avoid stalls. The integer and FPU pipelines are also integrated to pass data seamlessly between integer and floating point domains.
FPU Exception and Interrupt Handling
The Cortex-M33 FPU detects exceptional conditions during floating point operations and raises exceptions as required by the IEEE 754 standard. This includes cases like:
- Overflow/Underflow
- Divide by Zero
- Invalid Operation
- Inexact Result
The FPU reports exceptions to the CPU’s exception handling logic. The exceptions can optionally trigger interrupts based on configuration settings.
The FPU’s interrupt handling priority can be programmed to low, medium or high priority relative to other exceptions. This allows floating point exceptions to preempt lower priority code if needed.
Compliance with IEEE 754 Standard
The Cortex-M33 FPU architecture and instruction set are designed to fully comply with the IEEE 754 standard for binary floating point arithmetic. This includes support for:
- Normal Numbers – Standard floating point values
- Subnormal Numbers – Smaller than normal numbers, with reduced precision
- Infinity – Values exceeding range of normal numbers
- NaN – Not a Number values for handling undefined results
- Rounding Modes – Configurable rounding of results
- Exception Handling – Overflow, underflow, precision loss, etc.
Adherence to IEEE 754 ensures consistent, predictable floating point math behavior across different implementations and platforms.
Floating Point Extension
An optional Floating Point Extension is available for the Cortex-M33 FPU to add hardware accelerated trigonometric and logarithmic functions.
This extension provides low-latency high-precision implementations of functions like Sine, Cosine, Tangent, Arcsine, Arccosine, Square Root, and more.
The extension improves performance of algorithms requiring trigonometric or math library functions that would otherwise need to be approximated by software routines.
Summary
In summary, the key properties of the Arm Cortex-M33 FPU include:
- Full IEEE 754-compliant single precision FPU
- 32 dedicated 32-bit floating point registers
- Pipelined architecture capable of one float operation per clock cycle
- Integrated with CPU pipeline for seamless operation
- Floating point exceptions and interrupts
- Support for optional floating point hardware extension
The integrated FPU enables high performance floating point calculations needed for many embedded and IoT applications, while minimizing cost, power and chip area.
Overall, the Cortex-M33 processor strikes an optimal balance between high-end FPU capabilities and mainstream microcontroller constraints.
Floating Point Operations Per Second
The maximum theoretical floating point operations per second (FLOPS) of the Cortex-M33 FPU is determined by the processor’s clock frequency.
Since the FPU pipeline can execute most instructions in a single clock cycle, the FLOPS is approximately: FLOPS = Clock Frequency (in Hz)
For example, with a 100 MHz clock speed: FLOPS = 100 MHz = 100 million FLOPS
So theoretically, the FPU can carry out 100 million floating point operations per second at this frequency. In practice, actual application performance will depend on factors like:
- Instruction mix – Arithmetic vs memory operations
- Data dependencies and stalls
- Memory and bus transfer speeds
- Pipelining and parallelism with integer unit
But generally the Cortex-M33 FPU is capable of hitting close to its peak FLOPS on floating point code optimized for the architecture.
Comparisons to Other FPUs
Compared to other FPUs, the Cortex-M33 provides a strong balance of performance, power efficiency, and cost:
- Vs mainstream Cortex-M CPUs – Adds dedicated FP hardware over software emulation
- Vs Cortex-M4F – Supports more FP standards and features
- Vs Cortex-M7 – Lower power and area but with reduced double precision support
- Vs Cortex-A class – Less performance than NEON SIMD but also much smaller and lower power
For embedded applications requiring more intensive floating point than Cortex-M0/M3 class but tighter constraints than Cortex-A profile, the Cortex-M33 hits a sweet spot.
Use Cases and Applications
The Cortex-M33 FPU is designed for embedded and IoT applications requiring floating point math such as:
- Digital Signal Processing – Audio, image and video filtering algorithms
- Sensor Fusion – Combining data from multiple sensors for applications like robotics
- Machine Learning – Neural networks and inference algorithms
- Control Systems – Proportional-Integral-Derivative control loops
- Motion Estimation – Object tracking for computer vision
- Computer Graphics – 3D rendering and shader programs
The combination of high efficiency 32-bit floating point and low power consumption makes the Cortex-M33 well suited for edge computing applications.
Its single precision FPU provides sufficient performance for many embedded ML workloads while its smaller size enables integration into low cost microcontrollers.
Design Considerations and Tradeoffs
Some key design considerations and tradeoffs for Arm in developing the Cortex-M33 FPU architecture include:
- Precision vs Performance – Single precision balances range/precision against double precision performance
- Area/Power vs Features – Focused on commonly used FP32 over wider FP support
- Embedded vs Server Constraints – Optimized for low power and cost over peak throughput
- Hardware vs Software – In-core FPU vs software library approach
By focusing on widely used single precision floating point, Arm could optimize the FPU for low power and reduced complexity compared to a more full-featured 64-bit FPU.
And implementing the FP unit directly in hardware ensures high performance and efficiency versus a software library approach to floating point.
Implementation in Arm Cortex-M33 CPUs
The Arm Cortex-M33 FPU is implemented in various Arm Cortex-M33 processor cores including:
- Microchip ATSAMR34 Series MCUs
- NXP i.MX RT1050 Crossover MCU
- STM32L5 Series Ultra-low-power MCUs
- Cypress PSoC 64 Secure MCUs
- Infineon ARM Cortex-M33 Based HSM
These Arm ecosystem chips integrate the Cortex-M33 CPU with the FPU and other accelerators to provide high performance on embedded application workloads. The M33 cores are often combined with additional application-specific processing engines for analytics, machine learning, signal processing, and security.
Development Tools and Compiler Support
The Arm Cortex-M33 FPU is supported by all major embedded development toolchains including:
- GCC – GNU Arm Embedded and Arm Compiler toolchains
- IAR – IAR Embedded Workbench IDE and C/C++ compiler
- Keil MDK – Arm Keil Microcontroller Development Kit
The compilers automatically utilize the FPU for floating point operations generated from C/C++ code. Developers can access the FPU using intrinsics or custom assembly as well.
Debuggers and IDEs like Keil MDK provide register and memory views into the FPU registers during debug. Vendors also supply optimized math libraries utilizing the Cortex-M33 FPU.
Summary and Conclusion
The Arm Cortex-M33 processor delivers an efficient deeply embedded single precision floating point unit. With its combination of small size, low power, high performance floating point, and integrated exception handling, the Cortex-M33 FPU hits a sweet spot for embedded computing applications.
It brings high precision floating point capabilities to the cost sensitive microcontroller market. The Cortex-M33 FPU, as part of the complete Cortex-M33 CPU, provides a strong processed optimized for analytics, machine learning, digital signal processing, and other compute intensive workloads.