Atomic functions in ARM architecture provide synchronization capabilities to ensure thread safety and avoid race conditions when accessing shared resources. The key to implementing atomic functions is the use of exclusive access instructions that ARM processors provide. These instructions allow a thread to gain exclusive access to a memory location, perform a read-modify-write operation atomically, and release access when done.
Load-Linked and Store-Conditional Instructions
The main exclusive access instructions used to implement atomics in ARM are Load-Linked (LL) and Store-Conditional (SC). LL loads a value from memory and marks it as “linked”. SC will store a value only if no other thread has written to the linked location since the LL. This provides atomic read-modify-write semantics.
For example, an atomic increment may do: LL R1, [X2] // Load linked ADD R1, R1, #1 // Modify value SC [X2], R1 // Store conditional
If no other thread wrote to [X2] between the LL and SC, the store succeeds atomically. The SC result indicates if it succeeded or failed. If failed, the code can simply retry the LL-modify-SC sequence until it succeeds.
ARM Atomic Instructions
ARMv6 architecture introduced dedicated atomic instructions to simplify coding atomics using LL/SC. These perform atomic read-modify-write without having to manually code LL/SC loops.
Key instructions include:
- LDREX/STREX – Load/Store exclusive register
- LDREXB/STREXB – Load/Store exclusive byte
- LDREXH/STREXH – Load/Store exclusive halfword
For example, LDREX loads from memory into a register exclusively, STREX stores the value only if no intervening store took place. This enables simple atomic RMW sequences like: LDREX R1, [X2] ADD R1, R1, #1 STREX R0, R1, [X2]
STREX result in R0 indicates if the store succeeded. RETRY on fail.
Implementing Common Atomic Operations
Using exclusive load/store and atomic RMW instructions, various common atomic primitives can be implemented:
Atomic Exchange
LOOP: LDREX R1, [X2] MOV R0, R1 STREX R1, X3, [X2] CBNZ R1, LOOP
X3 has the value we want to exchange with current value in [X2]. R1 is scratch register. Above loads current value from [X2] into R1 exclusively, copies it into R0, stores new value X3 to [X2], and repeats if store fails.
Atomic Compare And Swap
LOOP: LDREX R1, [X2] CMP R1, X3 BNE FAIL STREX R1, X4, [X2] CBNZ R1, LOOP FAIL:
X3 is expected value, X4 is new value. Does atomic compare against current value in [X2], only stores new value if current value matched expected.
Atomic Add/Increment
LOOP: LDREX R1, [X2] ADD R1, R1, #1 STREX R2, R1, [X2] CBNZ R2, LOOP
Increments value in [X2] atomically using LDREX/STREX.
Atomic Flags
X1 = 0 // Initial value LOOP: LDREXB W2, [X1] CBNZ W2, LOOP // Spin if set MOV W2, #1 STREXB W3, W2, [X1] CBNZ W3, LOOP // Retry if store failed
Spins until value at [X1] is 0, then atomically sets it to 1.
Memory Barriers
In addition to atomic instructions, memory barriers are needed to synchronize access across cores and ensure coherency. Key barriers include:
- DMB – Data memory barrier
- DSB – Data synchronization barrier
- ISB – Instruction synchronization barrier
DMB ensures memory accesses before and after are observable in same order. DSB stalls execution until all memory accesses are complete. ISB flushes pipeline.
For example, a full barrier may be coded as: DMB ISH DSB SY ISB
Which ensures memory operations are ordered correctly, pipeline is flushed, and execution waits for memory to synchronize.
Locks Using Exclusives
Exclusive instructions can also implement locks and mutexes. A simple spinlock can be: LOCKED: .byte 0 lock: LDREXB W2, [X1] CBNZ W2, lock MOV W2, #1 STREXB W3, W2, [X1] CBNZ W3, lock unlock: MOV W2, #0 STREXB W3, W2, [X1] CBZ W3, unlock
Which spins until lock value is 0, then atomically sets it to 1 to acquire lock. Unlock sets it back to 0. Uses LDREX/STREX loop to retry if needed.
Compiler Atomic Builtins
Instead of hand coded assembly, compiler atomic builtins provide a simpler way to utilize architectural atomic support in C/C++ code: int x; atomic_store(&x, 10); // Atomic store int y = atomic_load(&x); // Atomic load int z = atomic_fetch_add(&x, 2); // Atomic RMW add
These are implemented using LDREX/STREX or equivalent instructions. Compiler handles generation of instruction sequences and retry loops.
Summary
ARM architectures provide exclusive access instructions like LDREX/STREX to enable atomic read-modify-write operations. These are used to implement common atomic primitives like exchange, CAS, increment. Memory barriers are needed between cores. Atomics are essential building blocks for lock-free concurrent algorithms and data structures on ARM platforms.