SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is an atomic memory operation?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is an atomic memory operation?

Eileen David
Last updated: September 12, 2023 2:04 pm
Eileen David 8 Min Read
Share
SHARE

An atomic memory operation is a type of operation in computing where a single memory access or update happens in an indivisible manner. Atomicity guarantees that the memory operation completes fully without any chance of interruption. This prevents race conditions and ensures data consistency, especially in multithreaded and multicore environments.

Contents
Why Atomicity MattersImplementing AtomicityLocks and MutexesAtomic Data Types and OperationsTransactional MemoryUses of Atomic OperationsLock-free Data StructuresReference CountingState SynchronizationConsensus AlgorithmsDatabase TransactionsParallel ComputingAtomicity on ARMSummary

Atomic operations are critical for parallel programming and high-performance computing. They allow multiple threads or processes to safely read and write shared data without interference between each other. The atomicity property helps avoid synchronization issues like dirty reads or lost updates.

Why Atomicity Matters

In typical computer operations, reading or writing to memory happens in multiple steps at the hardware level. For example, a simple write operation may involve:

  1. Retrieving the current value stored at a memory address
  2. Modifying or overwriting that value
  3. Writing the new value back to the same address

Now imagine two concurrent threads trying to increment a shared counter variable. Thread A reads the current counter value as 10, increments it to 11, but has not finished writing it back yet. In the meantime, Thread B also reads the same counter as 10, increments it to 11, and writes it back. Thread A now finishes writing 11 back to the counter. The correct final value should have been 12, but instead it incorrectly ends up as 11 due to the intermediate interference.

Atomic operations prevent such errors by ensuring indivisibility of a memory access. An atomic read-increment-write on the counter would proceed uninterrupted, avoiding the lost update problem. The counter transitions safely from 10 to 11 to 12 when atomicity is guaranteed.

Implementing Atomicity

At the hardware level, atomicity is implemented by the processor architecture and instruction set. Modern CPU designs provide special atomic instructions like compare-and-swap (CAS) or load-link/store-conditional (LL/SC) to enable atomic reads, writes or read-modify-writes. These special instructions are guaranteed by the CPU to be indivisible.

At the software level, atomicity is implemented using libraries, language constructs and APIs exposing the underlying hardware atomic instructions. Different programming languages and libraries have their own conventions like atomic data types, mutexes, semaphores etc. to allow atomic operations in multi-threaded code.

Locks and Mutexes

The simplest way to make an operation atomic in software is to associate a mutex (mutual exclusion lock) with it. The mutex ensures only one thread can execute the critical section of code at a time. Other threads trying to enter the same section are blocked until the first thread finishes executing. This provides atomicity by serializing access.

However mutexes have limitations. They can create bottlenecks when many threads try to access the same shared data. Complex deadlock situations can also arise when dealing with multiple mutexes. So locks are best suited for coarse-grained atomicity in small critical sections.

Atomic Data Types and Operations

At a higher level, languages like Java, C++11 and Python provide special atomic data types and wrapper classes that internally use mutexes and hardware atomic instructions. These provide atomic variants of simple operations like:

  • Atomic integers – getAndIncrement(), getAndAdd(), compareAndSet() etc.
  • Atomic references – atomic swap(), compareAndExchange() etc.
  • Atomic flags and booleans – testAndSet(), clear()

The advantage is programmers don’t have to deal with locks explicitly. Regular code can call atomic methods on special data types for atomicity. The implementations ensure thread-safety using efficient hardware instructions.

Transactional Memory

Transactional memory (TM) is a modern technique to make sections of code atomic while avoiding locks. Transactions resemble database transactions – a series of reads and writes that either fully complete or fail atomically. No intermediate state is visible.

TM is implemented via versioning of shared memory locations and optimistic concurrency control. Multiple transactions can occur concurrently performing tentative speculative reads/writes. On transaction end, these are validated and committed atomically if validation succeeds. Else the transaction is aborted and retried.

TM provides lock-free atomicity for larger code blocks compared to atomic data types. But it requires hardware support that is not yet ubiquitous. Some processors like IBM POWER provide TM instructions to accelerate concurrent algorithms.

Uses of Atomic Operations

Some common use cases and applications of atomic memory operations are:

Lock-free Data Structures

Concurrent data structures like queues, maps, trees etc. need atomic primitives to safely coordinate access between threads. E.g. a lock-free queue needs an atomic compare-and-swap operation to enqueue/dequeue nodes safely.

Reference Counting

Reference counting for memory management requires atomic read-increment-write on the reference counter to handle concurrent adjustments correctly.

State Synchronization

Threads may need to atomically read-and-update shared state variables like counters, sequence numbers, flags etc. to synchronize their actions.

Consensus Algorithms

Distributed consensus protocols like Paxos, Raft etc. rely on atomic registers or shared memory to elect leaders and agree on values across nodes.

Database Transactions

Databases use locking, MVCC and other techniques to make transactions atomic, consistent, isolated and durable (ACID).

Parallel Computing

Numerical algorithms running on multicore CPUs require atomic operations to parallelize safely while accumulating results, synchronizing steps etc.

Atomicity on ARM

ARM processors provide hardware support for atomic instructions under the A64 instruction set used in 64-bit ARMv8 architectures.

Key atomic primitives available are:

  • LDXR/STXR – Load Exclusive and Store Exclusive instructions to implement atomic read-modify-write operations like compare-and-swap.
  • LDAPR/STLR – Atomic memory ordering barriers to enforce sequencing between atomic accesses.
  • SWP – Atomic swap instruction as a legacy from 32-bit ARM.

The ARMv8 architecture also defines a formal memory model with precise rules regarding the ordering and visibility of atomics across threads. This memory model contract enables portable reasoning about concurrency in ARM multicore systems.

In addition, ARM CPUs support cache coherency mechanisms like snooping to ensure atomic values are properly synchronized across local caches of different cores. This prevents scenarios where cores end up with stale cached values.

At the software level, the ARM C/C++ compiler provides language extensions to generate optimal code using the underlying hardware atomic instructions. There is also a userspace library called libatomics that exposes various atomic operations.

The GCC ARM compiler recognizes the _Atomic keyword and typedef names like atomic_int to provide atomic variables. Operations like atomic_load(), atomic_store(), atomic_exchange() map to the corresponding hardware instructions for implementing atomicity in concurrent code targeting ARM platforms.

Summary

Atomic memory operations are indispensable for correct and efficient parallel programming today. Atomicity provides indivisible access to shared data without intermediate states. Hardware and software techniques implement atomicity using special instructions, data types, and synchronization constructs.

ARM processors include native support for atomic instructions and the required coherency mechanisms. This enables building high-performance concurrent data structures and applications on ARM-based platforms.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article How are the atomic functions implemented in case of ARM architecture?
Next Article What is the order of bytes in ARM processor?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Arm Processor Programming Language

ARM processors are very popular in embedded systems and mobile…

9 Min Read

Disabling All Interrupts on ARM Cortex-M0

The ARM Cortex-M0 is an extremely popular 32-bit embedded processor…

10 Min Read

Arm vs x86 Performance

The battle between Arm and x86 architectures has been going…

8 Min Read

ARM Cortex-M4 Processor Specification

The ARM Cortex-M4 is a 32-bit ARM processor core designed…

9 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account