Strongly ordered memory is a memory ordering model used in the ARMv6 and ARMv7 architectures to ensure correctness and consistency of data in concurrent and parallel programming. It provides stricter guarantees compared to weakly ordered memory by enforcing a total order on memory operations. This prevents issues like read-after-write hazards and ensures program correctness.
Introduction to Memory Ordering
In a multicore system with multiple CPUs accessing the same memory concurrently, the order in which memory operations from different CPUs reach memory can be unpredictable. This can lead to unexpected results and program errors. Memory ordering provides rules to restrict the order in which memory operations can be performed to prevent such errors.
Some key terms related to memory ordering:
- Memory barrier – A memory barrier instruction forces a CPU to complete all memory operations issued before the barrier before continuing with operations after the barrier. This enforces ordering.
- Atomic operation – An indivisible operation that executes completely without interruption. Used for synchronization.
- Acquire and Release Semantics – Acquire ensures memory operations before an atomic read aren’t reordered with it. Release makes sure writes aren’t reordered with subsequent memory operations.
Memory Ordering Models
Some common memory ordering models are:
- Strong ordering – All memory operations are sequentially consistent and appear to execute in program order. Strictest model.
- Weak ordering – Memory operations can be reordered for better performance. More prone to errors in multi-threaded programs.
- Release consistency – Writes before release aren’t reordered with writes after. Acquire prevents read reordering.
- Relaxed ordering – Most relaxed model with minimal ordering guarantees.
Need for Strongly Ordered Memory
Weakly ordered memory improves performance by allowing extensive reordering of memory operations. However, it requires careful use of memory barriers and atomic operations in code to prevent errors. This imposes a burden on programmers.
Strongly ordered memory simplifies programming by providing stricter guarantees. The hardware ensures sequential consistency so programmers don’t need to reason about complex reordering issues. This improves programming productivity. It also prevents a class of errors related to improperly synchronized memory accesses in concurrent code.
Strongly Ordered Memory in ARMv6 and ARMv7
The ARM architecture prior to version 7 uses weakly ordered memory. This required explicit memory barriers and cache operations to avoid synchronization issues. ARMv7 introduced support for strongly ordered memory via the SCTLR.A bit configuration.
Setting the SCTLR.A bit enforces the following rules:
- All explicit memory operations will appear to execute in program order.
- All memory writes complete before start of subsequent writes.
- No read can be relocated before a previous write.
- Any synchronization operation completes all previous memory accesses.
This ensures sequential consistency even without memory barriers. However, acquire/release semantics still need to be enforced for correctness in ARMv7. The LDREX/STREX instructions used for lock-free programming also obey strongly ordered rules.
Performance Impact
Strongly ordered memory prevents many possible optimizations and affects performance negatively. All memory instructions go through an address checking stage to enforce ordering. Additional pipeline stalls may be required to resolve hazards.
However, the impact is not very significant in most cases. Benchmarks show less than 2% performance degradation for ARMv7. And benefits like easier programming can outweigh this cost.
Enabling Strong Ordering in ARMv6 and ARMv7
To enable strongly ordered memory in ARMv6 and ARMv7:
- Set bit[27] A bit of SCTLR register to 1 using MCR instruction.
- Add DSB memory barrier after the MCR to ensure completion of previous writes.
- In ARMv6, need to set SCTLR.U bit too to enable stricter ordering.
For example: MCR p15, 0, r0, c1, c0, 0 @ set SCTLR.A = 1 DSB
This will enable strongly ordered memory accesses globally from that point onwards in the program.
Using Strongly Ordered Memory Effectively
Here are some tips for using strongly ordered memory effectively in ARMv6/7 programs:
- Enable strongly ordered memory early in initialization code to cover most program scope.
- Use DMB/DSB barriers to order critical memory operations as needed.
- Group shared data close together to improve cache locality and reduceaborts.
- Use native LDREX/STREX instructions instead of locks for synchronization.
- Measure performance impact before and after enabling strong ordering.
- Add ordering only to shared memory regions accessed across threads.
Strongly ordered memory simplifies writing concurrent software and prevents many memory ordering issues. ARMv6 and ARMv7 provide flexible options to either favor easier programming or maximum performance as needed.
Conclusion
Strongly ordered memory provides stricter guarantees of ordering at the cost of some performance. The ARMv6 and ARMv7 architectures provide support for both weakly ordered and strongly ordered modes. Enabling strong ordering via the SCTLR.A bit trades off some efficiency for easier concurrent programming and prevention of memory ordering issues. When used effectively, it can improve programming productivity without a major impact on performance in most cases.