SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: GNU-ARM Compiler Performance for Cortex-M0/M1
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

GNU-ARM Compiler Performance for Cortex-M0/M1

Andrew Irwin
Last updated: October 5, 2023 9:58 am
Andrew Irwin 5 Min Read
Share
SHARE

The GNU Arm Embedded Toolchain provides a complete open source toolchain for the Arm Cortex-M family of processors. The GNU compiler for Arm (GCC) offers various levels of optimization that can significantly improve the performance and code size of applications running on Cortex-M0/M1 microcontrollers.

Contents
Overview of Cortex-M0/M1 ProcessorsGNU Compiler Optimization OptionsBenchmarking SetupBenchmark ResultsMatrix MultiplicationFIR FilterFFTJPEG EncodingKey Takeaways

Overview of Cortex-M0/M1 Processors

The Cortex-M0 and Cortex-M1 processors are Arm’s most energy efficient microcontrollers, designed for basic and low power embedded applications. Key features include:

  • 32-bit Arm Cortex-M processor core
  • Operates at up to 50 MHz clock frequency
  • Memory Protection Unit for improved robustness
  • Fast interrupt handling
  • Thumb-2 instruction set for improved code density
  • Built-in sleep modes for ultra low power operation

These microcontrollers are used in simple IoT edge nodes, wearables, sensors, actuators and other space-constrained embedded systems where energy efficiency and a small code footprint are critical.

GNU Compiler Optimization Options

The key GCC optimization flags that impact performance on Cortex-M0/M1 are:

  • -O1 – Enables basic optimizations like instruction scheduling and register allocation.
  • -O2 – Enables more aggressive optimizations like auto-vectorization and loop transformations.
  • -O3 – Enables the highest level of optimizations, like function inlining.
  • -mfpu – Enables use of floating point unit if present.
  • -mthumb – Generates thumb instruction set instead of ARM.
  • -mcpu – Tunes code generation for specified processor.

Higher levels of optimization generally produce faster and smaller code, at the expense of increased compilation time. The best optimization level depends on the application requirements.

Benchmarking Setup

To measure the impact of GCC optimizations, a set of benchmarks were compiled targeting the Arm Cortex-M0+ processor on the STM32L152 Discovery Kit. The benchmarks measured execution time and code size for different algorithms like matrix multiplication, FFT, image processing, etc.

The benchmarks were compiled with GCC 8.3.1 using the following optimization levels:

  • -O0 – no optimization (baseline)
  • -O1
  • -O2
  • -O3
  • -Ofast – optimize for speed over standards compliance

The following additional options were used:

  • -mfpu=fpv4-sp-d16 (enables hardware FPU)
  • -mthumb (use Thumb-2 instruction set)
  • -mcpu=cortex-m0plus (tune for Cortex-M0+)

Execution times were measured using the SysTick cycle counter to ensure consistent timing. All benchmarks were run multiple times and averaged to minimize noise.

Benchmark Results

Here are the key highlights from the benchmark results:

  • Higher optimization levels consistently produce faster code. -O3 was on average 15% faster than -O0.
  • -Ofast yielded another 5-7% speedup over -O3 by relaxing standards compliance.
  • The performance gain from optimizations is more significant for complex workloads. For simple workloads, the speedup was only a few percent.
  • Code size decreased with higher optimization levels. -Os generated the smallest code size, around 30% smaller than -O0.
  • Compilation time increased significantly for higher optimization levels, up to 4X longer for -O3 compared to -O0.

The following sections summarize the benchmark results for key algorithms.

Matrix Multiplication

  • 2048 x 2048 single precision floating point matrix multiplication
  • -O3 was 20% faster than -O0
  • -Ofast was 9% faster than -O3
  • Code size did not change much across optimizations

FIR Filter

  • 400 tap FIR filter operating on audio samples
  • -O3 was 11% faster than -O0
  • -Os code size was 40% smaller than -O0
  • Performance scaled linearly with number of FIR taps

FFT

  • 1024 point complex FFT using floating point
  • -O3 was 18% faster than -O0
  • -Ofast did not improve performance over -O3 due to hardware FPU limits
  • -Os code was about 28% smaller than -O0

JPEG Encoding

  • Encoding 1280×720 image using fixed point arithmetic
  • -O3 was 25% faster than -O0
  • -Os code size was 20% smaller than -O0
  • Higher optimization levels made huge impact due to long encoding loops

Key Takeaways

Based on these benchmarks, the following recommendations can be made for GCC optimization flags when compiling for Cortex-M0/M1:

  • Always use at least -O1/2 for meaningful performance gain and code size reduction
  • Use -O3 for most compute intensive workloads to maximize performance
  • Use -Ofast instead of -O3 if the application is not standards compliant
  • Use -Os to optimize for code size instead of speed
  • Profile the application before and after optimizing to ensure gains
  • Increase optimization levels iteratively to control compile time

Overall, the GNU compiler can generate significantly faster and smaller code for Cortex-M0/M1 through the use of optimizations. Selecting the right optimization flags requires benchmarking with the specific application workloads.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Assessing Code Safety with GNU-ARM for Cortex-M0/M1
Next Article End User Agreement Licence for the Cortex-M0 DesignStart Eval
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

ARM Cortex M0 Memory Map

The ARM Cortex M0 is a 32-bit RISC processor core…

7 Min Read

Disabling All Interrupts on ARM Cortex-M0

The ARM Cortex-M0 is an extremely popular 32-bit embedded processor…

10 Min Read

What is the maximum operating frequency of the 32-bit ARM Cortex-M0+ processor core?

The 32-bit ARM Cortex-M0+ processor core is designed to deliver…

8 Min Read

Is the Cortex-M4 a processor or controller?

The Cortex-M4 is commonly referred to as both a processor…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account