ARM processors are one of the most popular CPU architectures used in embedded systems and IoT devices. Their low power consumption, good performance, and availability of free development tools make them an attractive option for many applications. This tutorial provides an introduction to programming ARM Cortex-M and Cortex-A series processors using C and assembly language.
Introduction to ARM Architecture
ARM processors are based on the ARM architecture, which refers to the core CPU design and instruction set. The ARM architecture is a reduced instruction set (RISC) design, which means it uses a simpler set of instructions compared to complex instruction set (CISC) architectures like x86. The key features of ARM architecture include:
- 32-bit instruction set
- Load/store architecture with a fixed number of general purpose registers
- Short instruction lengths to reduce memory footprint
- Conditional execution of most instructions to maximize efficiency
- Thumb instruction set for improved code density
- SIMD instructions for multimedia and DSP workloads
The ARM architecture is licensed to many companies that design their own CPUs like Qualcomm, Samsung, Apple etc. ARM CPUs are divided into families optimized for different applications:
- Cortex-M – Microcontroller chips for embedded and IoT applications
- Cortex-R – Real-time applications like automotive, industrial control
- Cortex-A – Application processors for mobile devices, tablets, TVs etc.
This tutorial focuses on the Cortex-M and Cortex-A series used in microcontrollers and application processors.
Getting Started with ARM Cortex-M Programming
The Cortex-M series are 32-bit RISC ARM processor cores designed for microcontroller and embedded applications. They have a streamlined architecture optimized for low-power operation with features like sleep modes, low interrupt latency and memory protection units. Let’s look at how to program the Cortex-M cores.
Toolchain Setup
To develop for Cortex-M processors, you need:
- Compiler – Converts C/C++ code to ARM instructions
- Assembler – For writing ARM assembly code
- Linker – Links compiled code with libraries to generate executable
- Debugger – For debugging code running on the processor
These tools are collectively called a toolchain. Some popular open-source ARM Cortex-M toolchains include:
- GNU ARM Embedded Toolchain
- Arm Compiler
- IAR Workbench
This tutorial uses the GNU toolchain. Download the pre-built binary for your OS from here. Extract the archive and add the bin folder path to your environment PATH variable.
Bare Metal Programming
Bare metal programming refers to directly accessing hardware resources without the help of an operating system. For Cortex-M devices, the common steps are:
- Initialize processor – Setup clock, memory, peripherals etc.
- Main loop – Run user code indefinitely
Let’s create a simple “blinky” example for a Nucleo-F103RB board with STM32F103 MCU. We will toggle an LED on the board.
Create a main.c file with the following code: #include “stm32f1xx.h” int main(void) { RCC->APB2ENR |= RCC_APB2ENR_IOPCEN; // Enable clock for GPIOC GPIOC->CRH = 0x33333333; // Set GPIOC Pin 13 to output mode while(1) { GPIOC->BSRR = GPIO_BSRR_BS13; // Set PC13 pin high for(int i=0; i<300000; i++); // Delay GPIOC->BRR = GPIO_BRR_BR13; // Set PC13 pin low for(int i=0; i<300000; i++); // Delay } }
This uses the STM32F1xx hardware abstraction layer API to configure and toggle GPIO pin PC13. To build it: $ arm-none-eabi-gcc -c main.c -mcpu=cortex-m3 -mthumb $ arm-none-eabi-ld -T memory.ld main.o -o main.elf $ arm-none-eabi-objcopy -O ihex main.elf main.hex
This compiles the C code to ARM instructions for Cortex-M3, links it to generate an executable and converts to Intel Hex format for programming the MCU flash memory.
Using a HAL and BSP
For complex projects, it is better to use a hardware abstraction layer (HAL) and board support package (BSP) provided by the MCU vendor instead of directly accessing registers. The HAL and BSP handle low level initialization, peripheral access and board-specific settings. For STM32 MCUs, STMicro provides the STM32Cube firmware package with HAL drivers and example code for all its boards. Using STM32Cube, the blinky example becomes: #include “stm32f1xx_hal.h” int main(void) { HAL_Init(); GPIO_InitTypeDef led_init; led_init.Pin = GPIO_PIN_13; led_init.Mode = GPIO_MODE_OUTPUT_PP; HAL_GPIO_Init(GPIOC, &led_init); while (1) { HAL_GPIO_TogglePin(GPIOC, GPIO_PIN_13); HAL_Delay(1000); } }
The HAL handles initializing the processor clock, GPIO pins and provides simple APIs for controlling GPIO, delays etc. This is easier than manually configuring registers.
ARM Assembly Programming
While C is commonly used for ARM Cortex-M programming, assembly language is still useful to learn. Reasons to use assembly include:
- Tight time-critical code segments
- Hardware specific operations not accessible from C
- Reusing legacy assembly code
Let’s look at some ARM assembly basics with examples for Cortex-M processors.
Registers
ARM processors have 31 general purpose 32-bit registers R0-R12 and stack pointer R13. Some registers have specific names and functions:
- R13 – Stack pointer
- R14 – Link register for function calls
- R15 – Program counter
The program status register (PSR) contains condition flags, interrupt enable bits etc.
Data Instructions
MOV R0, #5 ; Load immediate value 5 to R0 LDR R1, =0x20001000 ; Load memory address to R1 STR R2, [R1] ; Store R2 to memory pointed by R1
ARM has auto-increment and auto-decrement addressing modes for efficient sequential access: LDR R1, [R2], #4 ; Load from R2 and increment it by 4 LDR R3, [R4], #-8 ; Load from R4 and decrement it by 8
Arithmetic Instructions
ADD R0, R1, R2 ; R0 = R1 + R2 SUB R3, R4, #1 ; R3 = R4 – 1
Mulitplication uses MUL or ML instruction. Division uses UDIV for unsigned and SDIV for signed. MUL R5, R3, R4 UDIV R6, R7, R8
Logical Instructions
AND R1, R2, R3 ; Bitwise AND ORR R4, R5, #1 ; Bitwise OR EOR R8, R9, R10 ; Bitwise XOR
Shift and rotate instructions are also available.
Control Flow
Branches are done using B instruction: LOOP …. B LOOP ; Unconditional branch DONE: …. B DONE ; Branch to label
Conditional branching uses Bxx instructions like BEQ, BGT etc. The IT instruction sets up conditional execution for up to 4 subsequent instructions. CMP R0, R1 ; Compare R0 – R1 IT GT ; If greater than BGT DONE ; Branch to DONE
Function Calls
BL is used for function calls: BL func ; Call function … func: … BX LR ; Return from function
Cortex-A Series and AArch64 Programming
The Cortex-A series processors are high performance application processor cores compliant with the 64-bit ARMv8-A architecture. Unlike Cortex-M, they run full featured OSes like Linux and Android. Let’s go through some key points of programming Cortex-A series processors.
AArch64 Instruction Set
AArch64 is the 64-bit execution state of ARMv8 architecture used by Cortex-A cores. The key changes from 32-bit ARM instruction set are:
- 32 general purpose registers X0-X30 instead of R0-R12
- Instructions are all 32-bit long
- Addressing modes simplified to support 64-bit addresses
- Optional SIMD instructions for parallelism
User and Privileged Modes
AArch64 has several CPU modes with different privilege levels:
- EL0 – User mode, least privilege
- EL1 – Kernel mode
- EL2 – Hypervisor mode for virtualization
- EL3 – Monitor mode for security extensions
The current mode determines which instructions can be executed. For example some system registers are only accessible from higher privilege modes.
Boot and Kernel Initialization
On reset the Cortex-A processor starts executing from fixed ROM addresses in EL3. The firwmare initializes the CPU and devices and loads the kernel image to RAM. The kernel starts in EL2/EL1 to setup isolation and virtualization. C programs mainly run in EL0 user mode under the OS.
Neon SIMD Instructions
The NEON SIMD unit in Cortex-A provides instructions for DSP and media applications. NEON operates on 64-bit and 128-bit vectors in SIMD manner for parallel computation. Common instructions include:
- Arithmetic – ADD, MUL, SUB on vectors
- Logical – AND, ORR, XOR, shifts on vectors
- Load/Store – LD1, ST1 transfer vector to/from memory
- Table lookup – TBL for vector mappings
NEON intrinsics are available in GCC for easy access from C code.
Summary
This tutorial covered the key concepts of programming ARM Cortex-M and Cortex-A series processors using C and assembly language. We looked at the ARM architecture, toolchain setup, bare metal programming, using HAL and BSP, assembly instructions and AArch64 features. With this knowledge, you can get started on application development using the wide range of ARM CPU cores available.