Fixed vs Floating Point Math on Cortex-M4F

The Cortex-M4F processor has hardware support for both fixed point and floating point math operations. Fixed point math uses integer values to represent real numbers, while floating point uses a more complex format with a mantissa and exponent to represent a wider range of values. The choice between fixed and floating point depends on the application requirements and tradeoffs between precision, dynamic range, code size and performance.

Contents

Introduction to Fixed and Floating Point Fixed Point Math on Cortex-M4F Floating Point Math on Cortex-M4F Precision and Range Comparison Code Density and Performance Programming Considerations Use Cases and Recommendations Conclusion

Introduction to Fixed and Floating Point

Fixed point math represents real numbers using integers, with the radix point assumed to be in a fixed position. For example, the number 1.45 could be represented in Q8.8 fixed point format as 1.45 * 256 = 372. The main advantage of fixed point is that it maps efficiently to integer hardware instructions, so fixed point math can be very fast. However, fixed point has a limited dynamic range and precision for a given word size. Overflow and quantization errors can occur if values exceed the available number of bits.

In contrast, floating point uses a more flexible format to represent a much wider range of values. A floating point number consists of a sign bit, an exponent and a mantissa. The mantissa stores the precision bits while the exponent allows the number to cover a large dynamic range. Floating point can represent very large and very small numbers, with up to 24 bits of precision in single precision format. However, floating point math is more complex and usually slower than fixed point.

Fixed Point Math on Cortex-M4F

The Cortex-M4F processor contains a 32-bit ALU and barrel shifter that can efficiently perform fixed point math operations. Addition, subtraction, multiplication and bitwise logic ops execute in a single cycle. The processor also supports a range of saturating arithmetic instructions like QADD and QDADD which saturate results to the available quantization range rather than overflowing. This matches well to typical digital signal processing requirements.

Being a 32-bit CPU, the Cortex-M4F can perform single cycle 32×32->32-bit multiplies. This allows efficient implementations of multiply-accumulate DSP algorithms using Q31 or Q15 x Q15 -> Q31 data types. The barrel shifter also enables efficient normalization and scaling operations during accumulate chains.

Division and remainder operations on the Cortex-M4 take between 3-12 cycles depending on the implementation. High performance fixed point division can be achieved using the ARM Compiler intrinsic __aeabi_idivmod which produces modulo optimized division routines. Alternately, lookup table approximations may be used if timing is critical.

The Cortex-M4F also includes some useful saturation logic operations like QADD8, QSUB8 and QDADD which saturate results to 8-bit limits. These can accelerate specific multimedia and DSP algorithms.

Floating Point Math on Cortex-M4F

The Cortex-M4F has full hardware floating point support compliant with the ARM VFPv4 architecture. This includes single precision (32-bit) and double precision (64-bit) operations. The VFP unit is an optional component that may or may not be implemented in a particular Cortex-M4F silicon.

When enabled, the VFP unit provides low latency floating point instructions executing in as little as 1 cycle for adds/subtracts up to 7 cycles for divides. Loads and stores are single cycle. Floating point code density is improved compared to earlier VFP versions, with most instructions now encoding to a single 32-bit word.

The VFP architecture includes SIMD instructions for operating on multiple floats in parallel. This allows 2x 32-bit or 4x 16-bit floats to be processed per instruction. SIMD acceleration is valuable for DSP algorithms like FIR filters and matrix math.

Hardware support for conversions between fixed and floating point data types is also provided via VCVT instructions. Saturation logic on conversion can emulate fixed point overflow behavior.

One downside of the VFP unit is increased power consumption versus fixed point logic. Floating point code also requires about 2x the flash footprint compared to equivalent fixed point routines.

Precision and Range Comparison

For a given word size, fixed point provides better precision but a smaller dynamic range compared to floating point. For example 16-bit fixed point has 16 bits of precision which is better than half precision floats. But the range is limited to [-32768, 32767] vs [-65504, 65504] for half precision.

In 32-bit precision, floating point gains the advantage in both precision and range. Single precision floats have 24 bits of mantissa precision and a huge dynamic range of +/- 3.4e38. In comparison, Q31 fixed point is limited to 32 bits of precision and [-2147483648, 2147483647] range.

Application requirements determine whether this wide dynamic range is actually needed. The increased range of floats comes at a cost of reduced precision and higher coding overhead. For applications like DSP filters fixed point is often sufficient and more efficient.

Code Density and Performance

Fixed point math maps efficiently to the core integer pipeline of the Cortex-M4F. Operations like addition, logical shifts, multiplication and accumulation can be performed in just 1-3 cycles. This enables very high DSP performance at low power.

Floating point instructions require the specialized VFP unit. Additional cycles are needed for register transfers and interfacing with the core CPU. Typical floating point math takes 2-7 cycles depending on the operation.

Code density is also better with fixed point. Floating point instructions are larger on average, requiring 2x the flash for the same algorithms. Simplicity of fixed point also results in more efficient compiler output.

That said, the overhead of floating point instructions has been reduced compared to earlier versions of the VFP architecture. Improvements include single cycle loads/stores, lower latency FP adds/subtracts and better instruction encoding density.

Programming Considerations

From a programming perspective, fixed point code follows the natural integer operations of C/C++. Data types like int32_t and accumulators can be used directly. Optimal Q number formats require some analysis to avoid overflow conditions.

Floating point code maps well to the float and double types in C/C++. The programmer does not need to worry about dynamic range and overflow. However floating point does have precision issues that can cause problems with numerics. Small errors can accumulate over long compute sequences.

Fixed point code needs to be well commented to document the Q formats used. While floating point is more self documenting. Debugging aids like printf() work transparently on floats.

Multiplication in fixed point requires diligence to avoid overflow and properly normalize results. Floating point handles this automatically at the cost of precision. Divide operations have similar considerations.

Overall fixed point requires a deeper understanding of the math operations and Q formats to use. But executes very efficiently. Floating point is easier to program but less precise and optimal for code efficiency.

Use Cases and Recommendations

For typical digital signal processing using 16 or 32-bit data, fixed point on the Cortex-M4F will provide the best results. Applications like digital filters, servo control, sensor fusion, Kalman filters, statistical estimators etc will see good performance and precision with Q15 or Q31. The limited dynamic range is acceptable, and overflows can be managed with saturation arithmetic.

For applications requiring very high precision or dynamic range over 32 bits, floating point becomes advantageous. Examples include integrating error terms in position estimators or matrix math for model predictive control. The higher precision mantissa of single precision floats can produce more accurate results.

Applications that involve sums of products or dot products benefit from the large dynamic range of floating point. The automatic scaling prevents overflow errors. Floating point also helps in code portability across different hardware platforms.

As a summary, fixed point is recommended for most DSP tasks where 32-bit precision is adequate. It gives the best performance and code density. Use floats for applications needing precision beyond 32 bits or requiring high dynamic range like integrate/coherent accumulation.

Conclusion

In conclusion, the Cortex-M4F microarchitecture provides extensive support for both fixed and floating point numerical formats. Fixed point leverages the integer execution units to deliver efficient DSP performance at low power consumption. Floating point utilizes the optional VFP coprocessor to provide high precision and dynamic range at a cost of lower code density and increased cycles. The choice depends on the precision, range and efficiency requirements of the application.

Fixed vs Floating Point Math on Cortex-M4F

Introduction to Fixed and Floating Point

Fixed Point Math on Cortex-M4F

Floating Point Math on Cortex-M4F

Precision and Range Comparison

Code Density and Performance

Programming Considerations

Use Cases and Recommendations

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

What is Serial Wire Viewer (SWV) in Arm Cortex-M?

Flash Patch and Breakpoint Unit (FPB) in Arm Cortex-M Explained

Arm Cortex-M DAP bus and interconnect architecture Explained

Controlling Clocks and PLL for Power Savings in Cortex-M3