Assembly language is a low-level programming language that directly corresponds to a computer’s underlying machine or assembly language. Unlike high-level languages like C/C++, assembly language consists of mnemonic codes that directly control a microprocessor’s components like registers, arithmetic logic unit, etc. Learning assembly language allows programmers to write optimized code by directly controlling the CPU’s functions.
ARM Cortex M is a family of 32-bit RISC ARM processor cores licensed by Arm Holdings. The Cortex-M series targets microcontroller applications and is widely used in IoT devices, wearables, robotics, and other embedded systems. This tutorial provides a beginner’s guide to programming ARM Cortex M cores using assembly language.
Prerequisites
To follow this ARM assembly programming tutorial, you should have a basic understanding of:
- Digital logic and microprocessor architectures
- Hexadecimal number system
- Basic C programming
We’ll be using the ARM Thumb-2 instruction set which is supported by most Cortex-M cores like Cortex-M3, Cortex-M4, etc. Make sure you have access to an ARM Cortex M development board or simulator/debugger toolchain like Keil MDK, IAR EWARM, etc.
Development Environment Setup
You need a development environment with ARM toolchain to compile, assemble and debug ARM assembly code. We’ll use Keil uVision IDE which provides a free MDK-Lite version for Cortex-M processors. Here are the steps to set it up:
- Download and install Keil uVision IDE with MDK-Lite from the Arm website.
- Add a device pack like STM32F4xx to support your target device.
- Create a new uVision project, select your target device and add a source file with .s extension.
- Open the options for file and set the assembler to ARM Macro Assembler.
- Now you are ready to write ARM assembly code which can be assembled and downloaded to the target device.
ARM Assembly Basics
Registers
Like any microprocessor, ARM Cortex M cores contain registers which are small storage locations accessible in a single clock cycle. ARM follows Reduced Instruction Set Computer (RISC) architecture which has fewer instructions than Complex Instruction Set Computers (CISC). Here are some key 32-bit registers in Cortex-M:
- R0-R12 – General purpose registers for data operations
- SP (R13) – Stack pointer for storing temporary data
- LR (R14) – Link register that holds return addresses
- PC (R15) – Program counter pointing to current instruction
There are also 16 Advanced SIMD registers, Floating point registers and special registers like APSR, PRIMASK, etc. We’ll focus on the general purpose registers for now.
Data Instructions
ARM assembly provides various instructions to load data into registers or store register contents into memory. For example: LDR R1, =0x20001000 // Load 32-bit value 0x20001000 into R1 STR R2, [R3] // Store value in R2 to memory pointed by R3
Other data processing instructions allow addition, subtraction, logical operations like AND, OR etc. between registers or between a register and an immediate value.
Branching
Conditional branching in ARM assembly uses an IT (If-Then) instruction followed by conditional branches like BEQ, BNE: IT EQ // If equal BEQ loop // Branch to loop label … loop: // Destination label
This branches to loop if previous condition flag is set to equal. Other conditional branch instructions include BGT, BLT, BCS, etc.
Linking and Functions
BL (Branch with Link) and BX (Branch and Exchange) instructions are used to call and return from functions in ARM assembly. BL branches to the target label while also saving return address to Link Register (LR). BX causes a branch to the address in a register, usually LR to return. BL func // Call function … func: // Function Label … BX LR // Return to caller
ARM Programming Model
The ARM Cortex M programming model follows Von Neumann architecture with a single memory space for both code and data. Flash memory stores code while SRAM stores variables. The processor can access both memories through the same address and data bus.
Code memory is sequential and immutable during execution. Data memory can be read/written freely. ARM cores have a Harvard architecture variant that separates instruction and data memories but we’ll focus on the unified memory model.
Memory Segments
ARM Cortex M memory map consists of multiple segments like Flash, SRAM, Peripherals, etc. with pre-defined base addresses. Each segment is an array of bytes numbered sequentially starting from the base address.
For example, SRAM may start at address 0x20000000. 4 bytes from 0x20000004 to 0x20000007 can store a 32-bit variable var_1. The processor uses load/store instructions to access variables located at specific addresses.
Endianness
ARM follows little endian format where the least significant byte of a multi-byte value is stored at lowest address. For a 32-bit 0x11223344 stored at 0x20000004, memory will contain: 0x20000004 – 0x44 0x20000005 – 0x33 0x20000006 – 0x22 0x20000007 – 0x11
The processor takes care of proper endianness handling during load/store. Programmers just need to know the addressing details.
Writing ARM Assembly Code
Now that we have covered the key concepts, let’s look at a simple example of ARM Thumb assembly code for Cortex-M processors: 1 AREA program, CODE, READONLY 2 ENTRY 3 4 EXPORT Start 5 Start 6 7 MOVS R0, #10 8 MOVS R1, #20 9 10 ADD R2, R0, R1 11 12 Stop B Stop 13 ALIGN 14 END
Here is what each line does:
- Initialize code memory area named “program”
- Mark entry point for toolchain
- Export Start symbol
- Start label
- Load 10 to R0
- Load 20 to R1
- Add R0 and R1 storing result to R2
- Stop program execution
- Align code boundary
- End assembly
After assembling, this simple program can be run on a Cortex-M target. We can observe the register values like R0=10, R1=20, R2=30 in the debugger after stepping through each instruction.
This demonstrates the basic ARM assembly syntax. We can build more complex applications using loops, functions, variables, etc.
Advanced ARM Assembly Coding
Here are some more advanced ARM Thumb-2 assembly programming topics useful for functions, real-world projects and optimizations:
Stack and Subroutines
The stack allows storing temporary data and passing arguments during function calls. ARM stack grows downwards from high to low memory. We use the stack pointer register SP (R13) to Push/Pop data to and from the stack.
Function arguments are passed using the stack. Registers R0-R3 are used to pass the first few arguments while others are pushed to stack. BL preserves return address in LR. Subroutines use PUSH/POP to preserve volatile registers.
Inline and Embedded Assembly
For Cortex-M projects, most code is written in C using ARM compiler toolchain. Time-critical functions and optimizations can use inline or embedded assembly written within C code.
Inline assembly is inserted as strings within C code and allows access to C variables directly. Embedded assembly is written in assembly files included into C projects.
Intrinsic Functions
Compiler intrinsic functions like __disable_irq() allow inserting assembly instructions directly into C code. This gives full control of hardware resources like interrupts, DSP extensions, etc. from C rather than hand-written assembly.
SIMD and DSP
ARM processors include Single Instruction Multiple Data (SIMD) instructions to perform parallel computations on vectors. Cortex-M4 and above have DSP extensions using SIMD to accelerate signal processing algorithms.
DSP intrinsics are supported in C code while hand-crafted assembly can optimize utilizing these SIMD instructions.
Timers and Interrupts
Assembly language gives full control over microcontroller peripherals like timers, GPIO, communication buses etc. This allows optimizing interrupt service routines and peripheral initialization code.
Bit banding is a useful technique to set/clear individual peripheral registers directly using assembly inserts.
Code Optimization
Assembly programming enables code size and performance optimization using processor-specific features. Loop unrolling, instruction scheduling, reducing pipeline stalls are some common optimizations.
Inline assembly and intrinsic functions can be used to optimize hotspots and critical code segments without rewriting the full application.
Conclusion
ARM Cortex M assembly language enables writing optimized code for microcontrollers used in IoT, embedded, robotics and other applications. This tutorial covers ARM Thumb-2 assembly basics like syntax, registers, data processing, branching and functions needed to get started.
Advanced techniques like stack usage, SIMD instructions, peripherals control and code optimization help build real-world projects. With both high-level languages and low-level assembly, the ARM Cortex M family provides a strong, flexible platform for tomorrow’s embedded systems.