The Cortex-M4 processor includes a single precision floating point unit (FPU) that can significantly improve performance for applications using floating point math. However, the FPU is disabled by default and must be explicitly enabled before it can be used. This article provides a step-by-step guide on how to enable the FPU in Cortex-M4 based microcontrollers.
Overview of the Cortex-M4 FPU
The FPU in the Cortex-M4 is an implementation of the ARMv7E-M architecture. It supports single precision (32-bit) floating point data types and operations compliant with the IEEE 754 standard. Key features of the Cortex-M4 FPU include:
- Supports up to 2.14 GFLOPS at 210MHz
- Operates on 32-bit single precision floating point values
- Provides full hardware support for converting between float and integer values
- Implements commonly used mathematical operations like add, subtract, multiply, divide, square root, etc.
- Uses registers s0-s31 for floating point values
- Shares system resources like buses, memory, and peripherals with the CPU core
The FPU significantly boosts performance of code using floating point math. Typical speedups are 3x-10x depending on usage. This makes it very beneficial for DSP algorithms, 3D graphics, control systems, and other applications using floating point calculations.
Enabling the FPU in Cortex-M4 Devices
The Cortex-M4 FPU is disabled by default out of reset. To use the FPU, it must be explicitly enabled by setting the correct option bits. This is usually done by the processor boot code during system initialization. There are two main steps:
- Enable FPU access in the Auxiliary Control Register
- Enable lazy stacking for efficient exception handling
The steps need to be performed in order. Enabling the FPU without lazy stacking will result in undefined behavior. The following sections explain the steps in more detail.
1. Enable FPU Access in ACR
The Auxiliary Control Register (ACR) controls access permissions to various system resources in Cortex-M4 processors. There is a dedicated FPU enable bit that must be set to allow FPU instructions to execute.
To enable the FPU, set bit 20 in the ACR register: // Enable FPU in ACR ACR |= 0x00100000; The ACR is generally configured very early during boot up even before the .data section is initialized. This is done so any floating point variables declared in .data can be accessed correctly.
2. Enable Lazy Stacking for Exceptions
By default, the Cortex-M4 will save floating point state on every exception which can incur significant overhead. Lazy stacking allows optimization of this process by only saving FPU state right before a floating point instruction.
To enable lazy stacking, set bit 18 in the CONTROL register: // Enable lazy stacking for exceptions CONTROL |= 0x00040000; This causes minimal overhead for exceptions occurring during integer code execution. The CONTROL register configuration is done after the ACR but before any use of the FPU.
Modifying Compiler Settings
After enabling the FPU in hardware, compiler settings need to be modified to generate code using floating point instructions. This requires configuring the compiler to:
- Use hardware floating point calling convention
- Use FPU registers instead of soft-float emulation
- Perform FPU-specific optimizations
Exact compiler settings depend on which toolchain you are using. Some common examples are shown below:
GNU ARM Toolchain
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
ARM Compiler 5
armcc –cpu=Cortex-M4.fp –fpmode=fast
IAR Embedded Workbench
–cpu_mode thumb –fpu=VFPv4SP
Consult your compiler documentation for exact options. The key flags are enabling hardware float ABI and selecting the fpv4-sp FPU architecture.
Changes to Source Code
Aside from compiler settings, the following source code changes should be made when using the FPU:
- Use
float
instead ofdouble
for floating point values - Declare math functions like
sin()
,cos()
, etc frommath.h
instead ofcmath
- Link against
libm.a
for hardware implementations of math functions - Surround
float
to integer conversions with__enable_irq()
and__disable_irq()
to prevent corruption
With these changes, the existing code should work correctly using the hardware FPU without any behavioral differences.
Tips for Using the FPU
Here are some additional tips for working with the Cortex-M4 FPU:
- Minimize switching between float and integer code to reduce lazy stacking overhead.
- Use float liberally in performance critical code since FPU is much faster.
- Split floating point and integer variables into separate structs/classes for better performance.
- Measure cycle counts between soft-float emulation and FPU to quantify performance gains.
- Monitor stack usage since lazy stacking increases stack burden.
- Enable FPU early during debug sessions so hardware breakpoints work correctly.
Debugging and Profiling the FPU
It can take some effort to efficiently utilize the Cortex-M4 FPU. Here are some techniques for debugging and profiling floating point code:
- Set breakpoints on floating point instructions like VADD, VDIV, etc.
- Single step through code to verify correct registers are used.
- Print out Emulation vs FPU cycle counts for code segments.
- Generate assembly listing to analyze compiler output.
- Check lazy stacking behavior via exceptions and monitor CONTROL register.
- Measure FPU impact on interrupt latency and context switching.
- Use debugger to view and modify FPU register contents.
With careful debugging and profiling, the true performance benefits of the Cortex-M4 FPU can be realized.
Conclusion
Enabling the FPU in Cortex-M4 microcontrollers requires configuring the ACR and CONTROL registers in addition to compiler settings. This activates floating point hardware support for significant performance gains in math-heavy code. With the FPU enabled, existing code can benefit from hardware acceleration with only minor source modifications. Overall, the Cortex-M4 FPU is an extremely useful feature for applications leveraging floating point calculations.