What type of FPU is the cortex M33?

The Arm Cortex-M33 processor features a single precision floating point unit (FPU) that supports the IEEE 754 standard for floating point arithmetic. This FPU is optimized for power efficiency and high performance in embedded applications requiring floating point capabilities.

Contents

Overview of the Cortex-M33 FPU Floating Point Data Types Floating Point Registers Floating Point Instructions FPU Architecture and Pipeline FPU Exception and Interrupt Handling Compliance with IEEE 754 Standard Floating Point Extension Summary Floating Point Operations Per Second Comparisons to Other FPUs Use Cases and Applications Design Considerations and Tradeoffs Implementation in Arm Cortex-M33 CPUs Development Tools and Compiler Support Summary and Conclusion

Overview of the Cortex-M33 FPU

The key features of the Cortex-M33 FPU include:

Full single precision floating point unit compliant with IEEE 754 standard

Supports single precision data types: 32-bit floats
Floating point registers: 32 x 32-bit registers for floating point data
Floating point status and control register

Instructions for floating point arithmetic, comparison, conversion, and data movement
Pipelined architecture capable of one single precision operation per clock cycle
Integrated with CPU pipeline to minimize stalls during floating point operations

Configurable priority between exception handling and floating point instructions
Support for NaN propagation and subnormal numbers
Optional Floating Point Extension for specialized trigonometric functions

This built-in FPU enables the Cortex-M33 processor to efficiently execute algorithms and functions requiring floating point math. Single precision operations can execute concurrently with integer instructions to maximize overall performance.

Floating Point Data Types

The Cortex-M33 FPU works with the following IEEE 754-compliant single precision floating point data types:

Float (32 bit) – This is a 32 bit single precision floating point number with 8 bits for the exponent, 23 bits for the mantissa, and 1 bit for the sign.

All floating point operands, registers, and results are stored and calculated using the single precision 32-bit float format. The FPU does not support double precision (64-bit) floats.

Floating Point Registers

The Cortex-M33 FPU contains 32 dedicated 32-bit single precision floating point registers to hold operands and results for floating point instructions. These registers are distinct from the core integer registers.

The floating point registers are named S0 – S31. They can be accessed individually or in pairs to hold 64-bit values. The floating point register file is designed for simultaneous read and write access to maximize floating point performance.

Floating Point Instructions

The Cortex-M33 instruction set includes floating point instructions for:

Arithmetic Operations – Add, subtract, multiply, divide, square root, etc.
Comparisons – Equality, less than, greater than, etc.

Conversions – Between integer and floating point values
Data Movement – Load, store, transfer between FP and integer registers

These instructions operate on the single precision floating point data types and registers. The FPU is pipelined and can execute most floating point instructions in a single cycle with no stalls.

FPU Architecture and Pipeline

The Cortex-M33 FPU uses a dedicated pipelined architecture to maximize floating point performance. While the pipeline depth is not published, it likely contains 3-4 stages similar to most modern FPU designs.

The key elements of the FPU architecture include:

Pre-Fetch – Fetch the floating point instruction from memory

Decode – Decode the instruction and read register operands
Execute – Perform the floating point operation
Write Back – Write the result back to the floating point register file

Results from earlier pipeline stages are forwarded to later stages to avoid stalls. The integer and FPU pipelines are also integrated to pass data seamlessly between integer and floating point domains.

FPU Exception and Interrupt Handling

The Cortex-M33 FPU detects exceptional conditions during floating point operations and raises exceptions as required by the IEEE 754 standard. This includes cases like:

Overflow/Underflow

Divide by Zero
Invalid Operation
Inexact Result

The FPU reports exceptions to the CPU’s exception handling logic. The exceptions can optionally trigger interrupts based on configuration settings.

The FPU’s interrupt handling priority can be programmed to low, medium or high priority relative to other exceptions. This allows floating point exceptions to preempt lower priority code if needed.

Compliance with IEEE 754 Standard

The Cortex-M33 FPU architecture and instruction set are designed to fully comply with the IEEE 754 standard for binary floating point arithmetic. This includes support for:

Normal Numbers – Standard floating point values
Subnormal Numbers – Smaller than normal numbers, with reduced precision
Infinity – Values exceeding range of normal numbers

NaN – Not a Number values for handling undefined results
Rounding Modes – Configurable rounding of results
Exception Handling – Overflow, underflow, precision loss, etc.

Adherence to IEEE 754 ensures consistent, predictable floating point math behavior across different implementations and platforms.

Floating Point Extension

An optional Floating Point Extension is available for the Cortex-M33 FPU to add hardware accelerated trigonometric and logarithmic functions.

This extension provides low-latency high-precision implementations of functions like Sine, Cosine, Tangent, Arcsine, Arccosine, Square Root, and more.

The extension improves performance of algorithms requiring trigonometric or math library functions that would otherwise need to be approximated by software routines.

Summary

In summary, the key properties of the Arm Cortex-M33 FPU include:

Full IEEE 754-compliant single precision FPU

32 dedicated 32-bit floating point registers
Pipelined architecture capable of one float operation per clock cycle
Integrated with CPU pipeline for seamless operation

Floating point exceptions and interrupts
Support for optional floating point hardware extension

The integrated FPU enables high performance floating point calculations needed for many embedded and IoT applications, while minimizing cost, power and chip area.

Overall, the Cortex-M33 processor strikes an optimal balance between high-end FPU capabilities and mainstream microcontroller constraints.

Floating Point Operations Per Second

The maximum theoretical floating point operations per second (FLOPS) of the Cortex-M33 FPU is determined by the processor’s clock frequency.

Since the FPU pipeline can execute most instructions in a single clock cycle, the FLOPS is approximately: FLOPS = Clock Frequency (in Hz)

For example, with a 100 MHz clock speed: FLOPS = 100 MHz = 100 million FLOPS

So theoretically, the FPU can carry out 100 million floating point operations per second at this frequency. In practice, actual application performance will depend on factors like:

Instruction mix – Arithmetic vs memory operations

Data dependencies and stalls
Memory and bus transfer speeds
Pipelining and parallelism with integer unit

But generally the Cortex-M33 FPU is capable of hitting close to its peak FLOPS on floating point code optimized for the architecture.

Comparisons to Other FPUs

Compared to other FPUs, the Cortex-M33 provides a strong balance of performance, power efficiency, and cost:

Vs mainstream Cortex-M CPUs – Adds dedicated FP hardware over software emulation

Vs Cortex-M4F – Supports more FP standards and features
Vs Cortex-M7 – Lower power and area but with reduced double precision support
Vs Cortex-A class – Less performance than NEON SIMD but also much smaller and lower power

For embedded applications requiring more intensive floating point than Cortex-M0/M3 class but tighter constraints than Cortex-A profile, the Cortex-M33 hits a sweet spot.

Use Cases and Applications

The Cortex-M33 FPU is designed for embedded and IoT applications requiring floating point math such as:

Digital Signal Processing – Audio, image and video filtering algorithms

Sensor Fusion – Combining data from multiple sensors for applications like robotics
Machine Learning – Neural networks and inference algorithms
Control Systems – Proportional-Integral-Derivative control loops

Motion Estimation – Object tracking for computer vision
Computer Graphics – 3D rendering and shader programs

The combination of high efficiency 32-bit floating point and low power consumption makes the Cortex-M33 well suited for edge computing applications.

Its single precision FPU provides sufficient performance for many embedded ML workloads while its smaller size enables integration into low cost microcontrollers.

Design Considerations and Tradeoffs

Some key design considerations and tradeoffs for Arm in developing the Cortex-M33 FPU architecture include:

Precision vs Performance – Single precision balances range/precision against double precision performance

Area/Power vs Features – Focused on commonly used FP32 over wider FP support
Embedded vs Server Constraints – Optimized for low power and cost over peak throughput
Hardware vs Software – In-core FPU vs software library approach

By focusing on widely used single precision floating point, Arm could optimize the FPU for low power and reduced complexity compared to a more full-featured 64-bit FPU.

And implementing the FP unit directly in hardware ensures high performance and efficiency versus a software library approach to floating point.

Implementation in Arm Cortex-M33 CPUs

The Arm Cortex-M33 FPU is implemented in various Arm Cortex-M33 processor cores including:

Microchip ATSAMR34 Series MCUs
NXP i.MX RT1050 Crossover MCU
STM32L5 Series Ultra-low-power MCUs

Cypress PSoC 64 Secure MCUs
Infineon ARM Cortex-M33 Based HSM

These Arm ecosystem chips integrate the Cortex-M33 CPU with the FPU and other accelerators to provide high performance on embedded application workloads. The M33 cores are often combined with additional application-specific processing engines for analytics, machine learning, signal processing, and security.

Development Tools and Compiler Support

The Arm Cortex-M33 FPU is supported by all major embedded development toolchains including:

GCC – GNU Arm Embedded and Arm Compiler toolchains
IAR – IAR Embedded Workbench IDE and C/C++ compiler

Keil MDK – Arm Keil Microcontroller Development Kit

The compilers automatically utilize the FPU for floating point operations generated from C/C++ code. Developers can access the FPU using intrinsics or custom assembly as well.

Debuggers and IDEs like Keil MDK provide register and memory views into the FPU registers during debug. Vendors also supply optimized math libraries utilizing the Cortex-M33 FPU.

Summary and Conclusion

The Arm Cortex-M33 processor delivers an efficient deeply embedded single precision floating point unit. With its combination of small size, low power, high performance floating point, and integrated exception handling, the Cortex-M33 FPU hits a sweet spot for embedded computing applications.

It brings high precision floating point capabilities to the cost sensitive microcontroller market. The Cortex-M33 FPU, as part of the complete Cortex-M33 CPU, provides a strong processed optimized for analytics, machine learning, digital signal processing, and other compute intensive workloads.