When compiling code for ARM processors using gcc, there are three key compiler flags that control the target processor and floating point options: -mcpu, -mfloat-abi, and -mfpu. Understanding the differences between these flags and how to use them properly can help optimize performance and avoid issues. This article provides an in-depth explanation of what each flag does, how they interact, and recommendations for setting them optimally.
The Basics: -mcpu and -march
-mcpu specifies the exact target ARM processor architecture for compilation. For example, -mcpu=cortex-a8 indicates code should be optimized for a Cortex-A8 CPU. The compiler will take advantage of all features and optimizations specific to that architecture.
-march specifies a minimum architecture baseline. The compiler will generate code compatible with that architecture and can use any instructions supported in that architecture and later ones. For example, -march=armv7-a means to compile for the ARMv7-A architecture base, which is compatible with Cortex-A8 and newer cores.
In most cases, -mcpu is preferable to -march. Using -mcpu produces code optimized for one specific architecture, vs -march which targets a broader range of devices. However, -march ensures compatibility across multiple architectures. If writing library code or kernels intended for wide usage, -march ensures maximium compatibility.
Integer Code vs Floating Point: -mfpu
The -mfpu flag controls the floating point hardware target. This determines what floating point instructions will be generated by the compiler. For example:
- -mfpu=neon: Generate NEON advanced SIMD instructions
- -mfpu=vfpv3: Generate VFPv3 floating point instructions
- -mfpu=none: Only use integer pipeline, no floating point
ARM processors may contain a variety of floating point hardware options. VFP provides basic hardware floating point. NEON adds SIMD and vector operations. The compiler needs to know what hardware exists to generate compatible instructions.
If no -mfpu option is specified, gcc will default to -mfpu=auto which attempts to auto-detect the target floating point architecture. However, explicitly specifying -mfpu is recommended for optimal performance.
Floating Point ABI: -mfloat-abi
The -mfloat-abi flag controls the ABI (Application Binary Interface) used for passing floating point arguments and return values between functions. On ARM systems there are two primary options:
- softfp: Floating point arguments are passed in integer registers. Compatible with systems lacking hardware floating point.
- hard: Floating point arguments are passed directly in floating point registers. Requires hardware floating point support.
The “hard” ABI is more efficient when floating point hardware exists. However, the “softfp” ABI provides compatibility with systems that don’t have floating point hardware. This is important for writing library code or kernels that may be used on multiple platforms.
If -mfloat-abi is not specified, gcc will default to “softfp” for -march options prior to ARMv7 and NEON, and “hard” for -march ARMv7+ with NEON enabled. Explicitly setting this flag is recommended for clarity.
Recommendations for Compiler Flags
Based on the information above, here are some recommendations for selecting optimal compiler flags for ARM development:
- When possible, specify the exact target architecture with -mcpu rather than a minimum with -march. This generates optimal code for one platform.
- Enable a hardware FPU if present with -mfpu. Don’t rely on the default auto-detect, specify the FPU explicitly.
- Use -mfloat-abi=hard when compiling for a target with hardware floating point to maximize performance. Use -mfloat-abi=softfp when writing reusable code that may run on platforms without floating point.
- Start with the simplest flags for your target, then add more optimizations once the code is running properly. Aggressive optimizations can sometimes cause unexpected issues.
- Double check the compiler documentation for details on each architecture and FPU option supported.
As an example, compiling with: gcc -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=hard
Would generate optimal code specifically for Cortex-A9 CPUs, utilizing NEON instructions and the hard float ABI for maximum performance on that architecture.
Checking Compiler Settings
It can be useful to verify the actual architecture and options enabled by the compiler after setting the flags. GCC provides a few ways to do this:
- -v: Print information about invoked compilers and options during compilation.
- -Q: Print target specific information including architecture, FPU, and ABI settings.
- –target-help: Print options available for the target architecture and OS.
For example, passing -Q to gcc will summarize the target architecture, FPU configuration, ABI, and tuning settings: $ gcc -Q -c test.c The compiler target is: ARM … -mfloat-abi=hard -mfpu=neon -mcpu=cortex-a9 -mtune=cortex-a9
Reviewing this output after setting compiler flags helps validate the intended options have been properly enabled.
Interactions Between Flags
It’s important to understand how the -mcpu, -mfpu, and -mfloat-abi flags interact when used in combination:
- Using -mcpu automatically implies/overrides the -march setting. The specific CPU architecture takes precedence.
- -mfpu modifies the base architecture specified via -mcpu/-march to enable a specific FPU.
- -mfloat-abi controls how floating point values are passed, but does not affect code generation otherwise.
- If -mfpu and -mfloat-abi do not match the implied options for a given -mcpu, warnings will be issued about the mismatch.
As an example, this command line: gcc -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp
Would generate a warning about the softfp ABI not matching the hardfp ABI expected with Cortex-A9 + NEON. Explicitly setting options overrides the architecture defaults.
Compiler Optimization and Tuning
Beyond just specifying the target architecture, the GCC compiler provides many additional flags to control and optimize code generation. These include settings like:
- -O3: Enable level 3 optimizations.
- -ffast-math: Enable aggressive but unsafe floating point optimizations.
- -mtune=cortex-a8: Tune without changing instruction set for a specific microarchitecture.
- -mstructure-size-boundary: Optimize struct packing for cache line size.
The full set of optimization options is extremely diverse. Determining the best set of flags for a particular application may require substantial benchmarking and performance measurements across multiple flag combinations. Enabling more aggressive optimizations does not always improve performance -finding the right balance is key.
In general, optimizing for the exact target processor (-mcpu + tuning) and enabling basic optimizations like -O2/O3 provide the best starting point. Profile guided optimization can further enhance performance in many cases.
Summary
Specifying -mcpu, -mfpu, and -mfloat-abi properly when compiling for ARM with gcc helps generate optimal code by taking full advantage of the target architecture’s capabilities. -mcpu targets a specific CPU architecture, -mfpu enables the floating point instruction set, and -mfloat-abi controls the application binary interface for handling floating point values between functions. Matching these settings to the target platform and using additional optimization flags can maximize performance.
Understanding the ARM gcc compiler flags is key for developers working with the ARM ecosystem to build high performance applications. Proper flag configuration avoids compatibility issues, ensures proper utilization of CPU features, and sets the foundation for further optimization.