When deciding whether to use hardware or software floating point with Arm Cortex M processors, there are a few key factors to consider. Hardware floating point support provides faster floating point math performance, while software floating point gives more flexibility and portability. The choice depends on the application requirements and constraints.
Introduction to Floating Point on Arm Cortex M
Floating point numbers represent real numbers with a fraction and exponent, allowing a wide range of values to be represented. Single precision floats use 32 bits, with 1 sign bit, 8 exponent bits, and 23 mantissa bits. Double precision uses 64 bits, with 1 sign, 11 exponent, and 52 mantissa bits.
Arm Cortex M processors like Cortex-M4 and Cortex-M7 include optional hardware floating point units (FPUs). With an FPU, floating point operations can be performed in hardware quickly. Without an FPU, floating point math must be done in software using integer operations, which is much slower.
Benefits of Hardware Floating Point
Using the built-in FPU provides significant performance benefits for floating point intensive code:
- Hardware floating point is 10-100x faster than equivalent software routines
- Speedup applies to common operations like add, subtract, multiply, divide
- Hardware parallelizes operations; software is sequential
- Special values like infinities and NaNs handled in hardware
- Hardware includes accelerated transcendental functions
- Hardware maintains precision without needing large intermediate values
- Code size is reduced by using compact hardware instructions
For applications doing a lot of floating point math, like digital signal processing, 3D graphics, or sensor fusion, the hardware FPU will provide a major performance boost and faster execution times. The hardware is optimized specifically for floating point.
Benefits of Software Floating Point
While hardware floating point has performance advantages, software implementations have benefits around flexibility and portability:
- Works on any Cortex M, without requiring FPU support
- Code is portable between devices with and without FPU
- Can use same code on lower cost chips without hardware FP
- Precision and error handling can be customized in software
- Software routines can be modified and optimized
- Small code footprint for basic operations
- Avoids increase in chip cost, power usage from hardware FPU
Software floating point allows floating point code to work across any Arm Cortex M device. This provides flexibility in product development, letting you reuse code on lower cost microcontrollers missing the FPU. Software also gives more control over floating point precision and errors.
Floating Point Hardware Support in Arm Cortex M
The level of floating point support varies across the Arm Cortex M product line:
- Cortex-M0/M0+ – No floating point hardware
- Cortex-M3 – Optional single precision FPU
- Cortex-M4 – Optional single precision FPU
- Cortex-M7 – Optional single and double precision FPU
- Cortex-M23 – Optional single precision FPU
- Cortex-M33 – Mandatory single precision FPU
- Cortex-M35P – Optional single precision FPU
Higher end Cortex M cores add hardware floating point options. The most capable FPU support is on Cortex-M7, with optional single and double precision. Cortex-M33 is the first with mandatory single precision FPU. Software floating point is needed as a fallback for cores without FPUs.
Software Floating Point Libraries
To enable software floating point on Arm Cortex M, libraries are available with optimized routines written in C:
- Newlib-nano – open source library from Arm, BSD licensed
- ARMCompiler 6 – proprietary library from Arm
- RISC-V Compiler-RT – clang/LLVM float library, BSD licensed
- Berkely SoftFloat – BSD licensed pure software floating point
- Cephes Math Library – transcendental functions
These provide software implementations of float add, subtract, multiply, divide, comparison operations, type conversions, and math functions like sine, cosine, log, exponentiation. By linking in a software float library, code can perform floating point on any Cortex M.
Floating Point Code Size
Software floating point code takes up more size than hardware floating point. Here are some typical instruction counts for common operations:
- Float add – 1 instruction (FPU), ~100 instructions (software)
- Float multiply – 1 instruction (FPU), ~200 instructions (software)
- Float sin – 10-20 instructions (FPU), ~300 instructions (software)
- Float exp – 10-20 instructions (FPU), ~400 instructions (software)
Exact instruction counts depend on the implementation. But hardware floating point requires far fewer instructions than software routines for most operations. This reduces code size.
Software Floating Point Precision
With software floating point, precision is customizable based on application needs:
- Single precision – 32 bit floats
- Double precision – 64 bit floats
- Custom precisions – e.g. 40 bit floats
- Configurable mantissa/exponent sizes
The FPU only supports single and double precision in hardware. But with software, custom float sizes are possible for applications needing higher or lower precision. Precision affects accuracy, performance and memory usage.
Floating Point Code Optimization
There are optimization techniques to improve software floating point performance on Arm Cortex M:
- Use hardware integer operations for add, subtract, multiply
- Optimize division and remainder using constants
- Lookup tables for trig, log, exp instead of calculations
- Loop unrolling, function inlining to reduce overhead
- Assembly optimizations in critical functions
- Use MPU to ensure deterministic execution times
While software floating point is slower, various methods like lookup tables, DIY math, and assembly can help improve performance. Hardware FPUs use similar techniques internally.
Floating point hardware and software handle errors differently:
- FPU follows IEEE 754 spec for exceptions
- Software can implement custom error handling
- Software lets you control precision loss behavior
- FPU handles some errors asynchronously
- Software exceptions can be caught directly by code
The FPU will set exception bits defined in IEEE 754 spec on errors. But software floating point lets errors be detected in code immediately when they occur. This allows full control over error handling.
Floating Point Benchmarks
Here are sample benchmark results for 32-bit float operations on Cortex-M7 with FPU vs. software float:
|Operation||FPU Cycles||Software Cycles|
The hardware FPU provides around 10-100x speedup across basic and transcendental operations. Exact ratios depend on the software library used.
Power and Cost
The FPU increases chip cost, complexity, and power usage. Cortex M cores with FPUs have:
- Higher gate counts – FPU is over 20k gates
- Increased silicon area used
- Added power usage even when FPU not used
- Higher cost per unit for FPU versions
For low power or size constrained applications, avoiding the FPU can reduce system power and cost overheads. The impact varies based on specific Arm chip being used.
Software vs Hardware Tradeoffs
Here is a summary of the key tradeoffs between hardware and software floating point:
|Hardware FPU||Software Float|
|Performance||Much faster||Slower, but optimizable|
|Precision||Fixed single, double||Configurable precision|
|Code size||Much smaller||Larger code|
|Error handling||Defined by IEEE 754||Customizable|
|Portability||Only works with FPU||Works on any Cortex M|
|Power/Cost||Higher||Lower without FPU|
The right choice depends on if the benefits of hardware speed and size outweigh the need for software flexibility and portability for a project.
Recommended Usage Guidelines
Based on the tradeoffs, here are some general guidelines on when to use hardware vs software floating point with Cortex M:
- Use FPU for heavy floating point code to boost performance
- Use FPU if code size constraints make software impractical
- Use software float for portability across Cortex M devices
- Use software if FPU cost or power are prohibitive
- Use software float for custom precision needs
- Use software if error handling requirements differ from IEEE 754
For performance critical applications doing significant floating point, favor using the FPU to speed up execution. In other cases where flexibility or portability are priorities, software floating point may be the better choice.
Hardware and software floating point both have benefits for Arm Cortex M chips. Hardware FPUs provide extremely fast floating point, while software gives portability and precision configurability. For lightweight floating point uses, software may be suitable, but for intensive processing, the massive speedup of hardware floating point is hard to ignore. By understanding the tradeoffs, developers can choose the best floating point approach for their particular application and constraints.