For Cortex M chips without a dedicated floating point unit (FPU), performing floating point math operations efficiently can be challenging. However, with the right software libraries and techniques, it is possible to do floating point math on Cortex M CPUs lacking an FPU. This article explores the options and tradeoffs for implementing floating point math on FPU-less Cortex M microcontrollers.
Why Floating Point Math on Cortex M?
While Cortex M chips are intended for low-cost and low-power embedded applications, increasingly these applications require some floating point math capabilities. Examples include:
- Digital signal processing algorithms involving filters, transforms, etc.
- Embedded machine learning using neural networks
- Sensor data processing from pressure, temperature, accelerometer inputs
- Motor control applications with PI and PID loops
Rather than using a more expensive Cortex M chip with dedicated FPU, or moving to a different architecture, developers may want to enable floating point math on their existing Cortex M design. The key reasons are to minimize cost and maximize power efficiency.
Challenges of Software Floating Point
The Cortex M0/M0+/M1 chips lack an FPU, so floating point operations need to be handled in software. This brings several challenges:
- Floating point math is complex, requiring algorithms for addition, subtraction, multiplication, division, square root, etc.
- Software libraries take up flash and RAM on the already memory constrained Cortex M chips.
- Execution speed is much slower than dedicated hardware FPU.
- Precision is limited due to 32-bit single precision in software vs 64-bit double precision in hardware FPU.
- Special handling needed for overflow, underflow, denormalized numbers, etc.
- Consistency and correctness of results across toolchains, compilers, and libraries.
While performance and precision are limited, with careful coding and optimization, useful floating point math is achievable on Cortex M0/M0+/M1.
Software Floating Point Libraries
Several open source libraries are available that provide floating point math functions for Cortex M chips without FPU:
CMSIS Software FP
CMSIS Software FP is provided by ARM as part of the Cortex Microcontroller Software Interface Standard (CMSIS). It provides a common software interface for Cortex M cores. The software FP implementation includes C functions for addition, subtraction, multiplication, division, square root, trigonometric, exponential, logarithmic and other math functions. Key features:
- ANSI/IEEE 754 compliant 32-bit single precision
- Pure C implementation, no asm or compiler intrinsics needed
- Hand optimized algorithms using integer math
- Configurable to tradeoff performance vs precision
- MIT open source license
CMSIS software FP is a good choice when adherence to a standard math library API is desired. It is included with many IDEs and toolchains.
Berkeley SoftFloat
Berkeley SoftFloat is an open source floating point library targeted at systems without an FPU. It implements 32-bit and 64-bit floating point per the IEEE 754 standard. Key features include:
- Pure C implementation with optimizing options
- Thread safe, re-entrant code
- Portable across many architectures
- Separate libraries for single, double precision
- BSD open source license
For applications requiring portability across hardware platforms, including wider 64-bit types, SoftFloat is a good choice. The separate libraries allow optimized single precision math.
Other Options
In addition to CMSIS and SoftFloat, some other floating point software libraries include:
- cephes – collection of math functions in C
- FDLIBM – faithfully rounded math library
- libm – standard C math library, adapted for embedded
- MPLA – multiprecision floating point library
- MuLib – microcontroller library for high precision math
Developers can evaluate their specific application requirements when selecting among these software floating point libraries.
Floating Point Code Optimizations
When using software floating point libraries, developers can employ various optimizations to improve performance on Cortex M CPUs:
Precomputation
Compute costly math results once upfront, cache and reuse the results rather than recompute. Trading off some RAM for faster execution.
Approximation
Use polynomial or linear approximations for transcendental functions like sine, cosine, logarithms. Gives accuracy vs performance tradeoff.
Loop Unrolling
Unroll small fixed loops to reduce overhead of branches and loop counter logic in innermost math operations.
Assembly Optimize
Hand optimize key algorithms in assembly language, leveraging available registers and data types for massive speedups.
Intrinsic Functions
Use compiler intrinsic functions to generate SIMD or other specialized instructions where supported, avoiding library function call overhead.
Reduce Precision
Use lower precision data types like 16-bit float or fixed point where possible to gain performance at the cost of precision.
Profiling tools can help identify optimization opportunities and quantify performance gains from various techniques.
Leveraging Fixed Point Math
While floating point is necessary for some applications, fixed point math may meet requirements in other cases. With fixed point, computations are performed on integers rather than floats. Benefits of fixed point math include:
- Higher precision for a given word length
- No special handling of edge cases like denormals, underflow, etc.
- Slightly faster computation than floats
- Deterministic behavior
The limitations are fixed point’s lower dynamic range vs floats, and needing to scaling values to preserve precision. Fixed point math libraries like QMath provide optimized fixed point operations.
Leveraging Low Power Math Accelerators
Some Cortex M based microcontrollers include integrated math accelerators to offload intensive floating point or fixed point math sequences. These help reduce software overhead and memory usage. Some examples include:
- STM32L4/L4+ – CORDIC accelerator for trig, hyperbolic, log, exp functions
- STM32F7 – FMAC accelerator for 32-bit fixed point multiply-accumulates
- NXP Kinetis – FlexISP digital signal processor
- TI MSP432 – Low energy accelerator (LEA) for FIR filters
When available, math accelerators can provide 10-100x speedups on accelerated algorithms with minimal software and power overhead.
Leveraging External Math Chips
For more intensive floating point requirements, an external math coprocessor chip can be added to offload the Cortex M CPU. Options include:
- FPGA – Custom floating point logic implemented in FPGA fabric
- GPU – Graphical processing unit efficient at math operations
- DSP – Digital signal processor optimized for math algorithms
- SoC – Higher end Cortex-A/R/M cores with built-in FPU
The tradeoff is increased cost, complexity and power versus maximum performance on complex algorithms. External chips best for high performance needs.
Conclusion
For Cortex M microcontrollers without an FPU, efficient floating point math is achievable using software libraries, code optimizations, fixed point math, hardware accelerators, and external coprocessors. Performance and precision requirements determine best approach.
Software libraries like CMSIS or SoftFloat provide portable floating point using pure C. Code optimizations and fixed point math can improve software performance. Hardware accelerators and coprocessors maximize performance for complex floating point operations.
With careful coding and the techniques outlined, useful floating point math is realizable on even low-cost FPU-less Cortex M chips.