How to Enable the FPU in Cortex-M4 Microcontrollers?

The Cortex-M4 processor includes a single precision floating point unit (FPU) that can significantly improve performance for applications using floating point math. However, the FPU is disabled by default and must be explicitly enabled before it can be used. This article provides a step-by-step guide on how to enable the FPU in Cortex-M4 based microcontrollers.

Contents

Overview of the Cortex-M4 FPU Enabling the FPU in Cortex-M4 Devices 1. Enable FPU Access in ACR 2. Enable Lazy Stacking for Exceptions Modifying Compiler Settings GNU ARM Toolchain ARM Compiler 5 IAR Embedded Workbench Changes to Source Code Tips for Using the FPU Debugging and Profiling the FPU Conclusion

Overview of the Cortex-M4 FPU

The FPU in the Cortex-M4 is an implementation of the ARMv7E-M architecture. It supports single precision (32-bit) floating point data types and operations compliant with the IEEE 754 standard. Key features of the Cortex-M4 FPU include:

Supports up to 2.14 GFLOPS at 210MHz

Operates on 32-bit single precision floating point values
Provides full hardware support for converting between float and integer values
Implements commonly used mathematical operations like add, subtract, multiply, divide, square root, etc.

Uses registers s0-s31 for floating point values
Shares system resources like buses, memory, and peripherals with the CPU core

The FPU significantly boosts performance of code using floating point math. Typical speedups are 3x-10x depending on usage. This makes it very beneficial for DSP algorithms, 3D graphics, control systems, and other applications using floating point calculations.

Enabling the FPU in Cortex-M4 Devices

The Cortex-M4 FPU is disabled by default out of reset. To use the FPU, it must be explicitly enabled by setting the correct option bits. This is usually done by the processor boot code during system initialization. There are two main steps:

Enable FPU access in the Auxiliary Control Register
Enable lazy stacking for efficient exception handling

The steps need to be performed in order. Enabling the FPU without lazy stacking will result in undefined behavior. The following sections explain the steps in more detail.

1. Enable FPU Access in ACR

The Auxiliary Control Register (ACR) controls access permissions to various system resources in Cortex-M4 processors. There is a dedicated FPU enable bit that must be set to allow FPU instructions to execute.

To enable the FPU, set bit 20 in the ACR register: // Enable FPU in ACR ACR |= 0x00100000; The ACR is generally configured very early during boot up even before the .data section is initialized. This is done so any floating point variables declared in .data can be accessed correctly.

2. Enable Lazy Stacking for Exceptions

By default, the Cortex-M4 will save floating point state on every exception which can incur significant overhead. Lazy stacking allows optimization of this process by only saving FPU state right before a floating point instruction.

To enable lazy stacking, set bit 18 in the CONTROL register: // Enable lazy stacking for exceptions CONTROL |= 0x00040000; This causes minimal overhead for exceptions occurring during integer code execution. The CONTROL register configuration is done after the ACR but before any use of the FPU.

Modifying Compiler Settings

After enabling the FPU in hardware, compiler settings need to be modified to generate code using floating point instructions. This requires configuring the compiler to:

Use hardware floating point calling convention
Use FPU registers instead of soft-float emulation
Perform FPU-specific optimizations

Exact compiler settings depend on which toolchain you are using. Some common examples are shown below:

GNU ARM Toolchain

arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

ARM Compiler 5

armcc –cpu=Cortex-M4.fp –fpmode=fast

IAR Embedded Workbench

–cpu_mode thumb –fpu=VFPv4SP

Consult your compiler documentation for exact options. The key flags are enabling hardware float ABI and selecting the fpv4-sp FPU architecture.

Changes to Source Code

Aside from compiler settings, the following source code changes should be made when using the FPU:

Use float instead of double for floating point values
Declare math functions like sin(), cos(), etc from math.h instead of cmath
Link against libm.a for hardware implementations of math functions

Surround float to integer conversions with __enable_irq() and __disable_irq() to prevent corruption

With these changes, the existing code should work correctly using the hardware FPU without any behavioral differences.

Tips for Using the FPU

Here are some additional tips for working with the Cortex-M4 FPU:

Minimize switching between float and integer code to reduce lazy stacking overhead.
Use float liberally in performance critical code since FPU is much faster.
Split floating point and integer variables into separate structs/classes for better performance.

Measure cycle counts between soft-float emulation and FPU to quantify performance gains.
Monitor stack usage since lazy stacking increases stack burden.
Enable FPU early during debug sessions so hardware breakpoints work correctly.

Debugging and Profiling the FPU

It can take some effort to efficiently utilize the Cortex-M4 FPU. Here are some techniques for debugging and profiling floating point code:

Set breakpoints on floating point instructions like VADD, VDIV, etc.
Single step through code to verify correct registers are used.

Print out Emulation vs FPU cycle counts for code segments.
Generate assembly listing to analyze compiler output.
Check lazy stacking behavior via exceptions and monitor CONTROL register.

Measure FPU impact on interrupt latency and context switching.
Use debugger to view and modify FPU register contents.

With careful debugging and profiling, the true performance benefits of the Cortex-M4 FPU can be realized.

Conclusion

Enabling the FPU in Cortex-M4 microcontrollers requires configuring the ACR and CONTROL registers in addition to compiler settings. This activates floating point hardware support for significant performance gains in math-heavy code. With the FPU enabled, existing code can benefit from hardware acceleration with only minor source modifications. Overall, the Cortex-M4 FPU is an extremely useful feature for applications leveraging floating point calculations.

How to Enable the FPU in Cortex-M4 Microcontrollers?

Overview of the Cortex-M4 FPU

Enabling the FPU in Cortex-M4 Devices

1. Enable FPU Access in ACR

2. Enable Lazy Stacking for Exceptions

Modifying Compiler Settings

GNU ARM Toolchain

ARM Compiler 5

IAR Embedded Workbench

Changes to Source Code

Tips for Using the FPU

Debugging and Profiling the FPU

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Changing Interrupt Priority on Cortex-M Microcontrollers

Tips for Debugging ARM Cortex-M3 with OpenOCD and GDB

What Is the Difference Between Arm Cortex A and M?

What is ARM Cortex-R8?