The ARM Cortex-M0 is a 32-bit RISC processor optimized for low-power embedded applications. Its assembly instruction set provides basic computational and data transfer capabilities through simple, fixed-length instructions. While more complex operations require sequences of these basic instructions, the Cortex-M0 ISA aims to maximize performance per clock cycle.
Data Processing Instructions
These allow basic arithmetic and logical operations on 32-bit register contents or immediate values:
- ADD, SUB – Add or subtract two registers or a register and an immediate value
- RSB – Reverse subtract between a register and an immediate value
- ADC, SBC – Add/subtract with carry between registers or register and immediate
- RSC – Reverse subtract with carry between register and immediate
- AND, ORR, EOR – Bitwise AND, OR, XOR on registers or register and immediate
- BIC – Bit clear register contents using AND with NOT of immediate value
- MOV, MVN – Move or move NOT to copy register contents or invert immediate
- CMP, CMN – Compare two registers or register vs immediate to set status flags
- TST – Test register contents vs immediate to set status flags
These basic operations enable typical math, logical, and comparison functions required for most code. Status flags are updated automatically based on the results, allowing conditional execution of subsequent instructions.
Load/Store Instructions
Data transfer to/from memory uses these instructions:
- LDR – Load word or half-word from memory into register
- STR – Store register contents into memory word or half-word
- LDM – Load multiple registers from consecutive memory words
- STM – Store multiple registers into consecutive memory words
Memory access addresses can be simply offset from a base register value. Load and store multiples allow efficient buffered data transfer. Hardware automatically converts between register and memory data sizes as 32-bit registers can store 8-bit, 16-bit, or 32-bit data.
Branch Instructions
These instructions alter sequential program flow:
- B – Unconditional branch to PC-relative offset
- BL – Branch with link to save return address to LR
- BX – Branch indirectly through register
- BLX – Branch and link indirectly through register
Conditional variants switch execution to the target address if flags meet the specified condition code, enabling loops, if-then-else, etc. Helper instructions BKPT, SVC, and ISB provide debugging, supervisor calls, and synchronization.
Status Register Access
Special instructions read or write the CPSR status register:
- MRS – Move PSR contents to general purpose register
- MSR – Move general purpose register contents to PSR
This allows examination of status flags, interrupt enables, and execution mode bits as well as context switching by modifying the PSR directly. Access is limited to specific PSR fields based on privelege level.
Memory Access
The Cortex-M0 has several instructions to enable memory mapped IO:
- LDREX/STREX – Locked read/write to synchronize access to shared resources
- LDM/STM – Unpriveleged variants to protect regions
- LDRT/STRT – Unpriveleged access to system control space
User code can safely access peripheral registers and shared data structures with these instructions. The processor automatically generates any stall cycles required until resources become available.
Interrupts
The NVIC unit controls interrupt handling:
- CPSIE/CPSID – Enable or disable interrupts globally
- SEV – Send event signal to other processors
- WFE/WFI – Wait for event or interrupt with low power mode
This allows flexible multi-core synchronization and power saving modes. Handlers stored in the vector table will automatically save context, branch to the ISR, then restore context and resume execution after the ISR completes.
Exclusive Access
LDREX/STREX instructions allow atomic read-modify-write sequences:
- LDREX – Load exclusive locks memory location
- STREX – Store exclusive succeeds only if lock held
- CLREX – Clear exclusive monitor
If the exclusive monitor detects any access to the locked location between LDREX and STREX, it will fail the store. This enables synchronization without using locks.
Overflow/Saturation
Signed and unsigned overflow conditions can be detected with these instructions:
- SSAT – Signed saturate to N-bit value
- USAT – Unsigned saturate to N-bit value
- QADD/QDADD – Saturating add
- QSUB/QDSUB – Saturating subtract
This allows optimizing dynamic range use in DSP algorithms. Saturation avoids unexpected wrap-around on overflow. The standard ADD/SUB instructions also set overflow/carry flags.
Shift and Rotate
Data manipulation made easier with these instructions:
- LSL/LSR/ASR/ROR – Logical/arithmetic shift right or rotate by immediate
- RRX – Rotate right extended by carry flag
Shifts allow scaling of data values, extracting bitfields, inexpensive multiplies/divides. Rotates facilitate bit reversal, CRC calculation, channel interleaving, etc. Useful for many algorithms.
Reverse Bytes
Byte order reversal instructions:
- REV – Reverse bytes in 32-bit register
- REV16 – Reverse bytes in 16-bit halfword
- REVSH – Reverse bytes in bottom 16-bit halfword
Useful for converting between big/little endian data formats used in codecs, network protocols, file formats, etc. Saves many instructions vs. software Byte reversal.
Count Leading Zeros
Counts leading zero bits in registers:
- CLZ – Count leading zeros in 32-bit value
Quickly finds first 1 bit location in data streams, useful in normalization, serialization, etc. Saves software checking each bit individually.
Saturating Addition and Subtraction
Saturated math ops prevent overflow:
- QADD – Saturating ADD
- QDADD – Saturating doubleword ADD
- QSUB – Saturating SUB
- QDSUB – Saturating doubleword SUB
Results roll over at maximum positive/minimum negative values. Useful for digital signal processing to avoid large wrap-arounds distorting outputs.
Summary
The Cortex-M0 assembly language provides a straightforward set of instructions for data movement, arithmetic/logic, branching, and status register access. Optimization for embedded control applications results in high performance despite small silicon area. Interrupt handling, exclusive access, and saturation support facilitate real-time processing and hardware control tasks.