LDR (Load/Store Doubleword Register) and STR (Store Register) are two common ARM instruction types used for memory access. The main difference between LDR and STR is that LDR loads data from memory into a register while STR stores data from a register into memory. Understanding when to use each can help optimize code for better performance.
LDR – Load Doubleword
The LDR instruction loads a doubleword (32-bit) value from memory into an ARM register. Its basic syntax is: LDR Rd, [Rn, #offset]
Where:
- Rd = Destination register where the loaded value will be stored
- Rn = Base register containing the memory address to load from
- Offset = Optional offset from the base register value (can be positive/negative)
For example: LDR R1, [R2, #8]
This will load the 32-bit value from the memory address in R2 + 8 and store it in R1. The offset is optional and defaults to 0 if not specified.
Some key properties of LDR:
- Loads data from memory into a register
- Destination must be a register, cannot store directly to memory
- Source address can be a register + offset or a PC-relative literal value
- Commonly used to load data values needed for computation
STR – Store Register
The STR instruction stores a register value into memory. Its syntax is: STR Rd, [Rn, #offset]
Where:
- Rd = Source register containing the value to store
- Rn = Base register containing the target memory address
- Offset = Optional byte offset from base register
For example: STR R1, [R2, #-4]
This will store the value in R1 to the memory address in R2 – 4.
Key properties of STR:
- Stores a register value into memory
- Cannot directly load data from memory into register
- Destination is a memory address calculated from base + offset
- Used to save results of computations to memory
Key Differences
The main differences between LDR and STR are:
- Data Direction – LDR loads memory into register, STR stores register into memory
- Operands – LDR destination is a register, STR source is a register
- Use Cases – LDR used to load data to operate on, STR used to store results
Examples
Here are some examples to illustrate LDR and STR usage: // Load 2 values from memory into R1 and R2 LDR R1, [R8, #16] LDR R2, [R8, #20] // Add R1 and R2, store result in R3 ADD R3, R1, R2 // Store R3 back to memory STR R3, [R8, #24]
In this code:
- LDR instructions load data from memory addresses in R8 into registers R1 and R2
- R1 and R2 are added, result stored in R3
- STR instruction stores the result (R3) back to memory
Some key points:
- LDR used to load source data values into registers
- Computation performed on registers
- STR used to save the final result back to memory
// Get address of a variable in R1 LDR R1, =var // Load variable value from memory into R2 LDR R2, [R1] // Increment value ADD R2, R2, #1 // Store updated value back to memory STR R2, [R1]
Here LDR and STR are used to load, modify, and update a variable in memory.
LDR loads the variable address and value into registers. The value is incremented. STR updates memory with the new value.
Memory Access Types
LDR and STR support several different addressing modes for flexible memory access:
- Offset – Rn + positive/negative offset, eg. [R5, #8]
- Pre-indexed – Rn is updated to point to new address, eg. [R6, #16]!
- Post-indexed – Rn updated after access, eg. [R7], #20
- Literal – PC-relative access, eg. [PC, #24]
Pre and post-indexing automatically update the base register enabling sequential access. Literals are useful for constants.
Memory Ordering and Synchronization
STR and LDR have sequential memory ordering by default. This means loads and stores can be reordered by the CPU and caches for performance. To enforce ordering:
- Use the appropriate memory barriers (DMB, DSB)
- Mark memory as device (Strongly Ordered)
- Set memory types in MMU configuration
This prevents reordering of memory accesses across barrier boundaries. Useful for MMIO registers and shared memory synchronization.
Atomicity
LDREX and STREX instructions can atomically load and modify memory. This prevents interruption during multi-step operations. For example: LDREX R1, [R2] ; Load exclusive ADD R1, R1, #1 STREX R3, R1, [R2] ; Store exclusive CMP R3, #0 BNE retry ; Retry if store failed
This increments a value in memory atomically. STREX succeeds only if no other write occurred. This constructs a critical section.
Load/Store Multiples
LDM and STM transfer multiple registers to/from memory at once. Up to 16 registers can be transferred with a single instruction. STM R1!, {R2-R8} ; Store R2-R8 to memory at R1, update R1 LDM R5, {R6-R12} ; Load R6-R12 from memory at R5
This enables efficient batch saving/restoring of context during function calls.
Load/Store Exclusives
LDREX and STREX operate as a pair to read, modify, and conditionally write memory if no intervening access occurred. Useful for synchronization. LDREX R1, [R2] ADD R1, R1, #1 STREX R3, R1, [R2]
STREX succeeds only if no other store to [R2] occurred between LDREX and STREX. Allows atomic read-modify-write.
Scaled Addressing
Scaled register offset addressing is useful for accessing members of a structure. The offset can be multiplied by a power of 2: LDR R1, [R2, R3, LSL #3] ; R3 * 2^3 = R3 * 8
This calculates the offset implicitly. Saves instructions vs explicit shift and add.
PC-Relative Addressing
LDR supports PC-relative addressing for position-independent code. Allows accessing globals and constants relative to PC: LDR R1, [PC, #16] ; Get address of global LDR R2, [PC, #0] ; Load constant
Useful for lookup tables, jump tables, and accessing literals.
Architectural Differences
There are some differences in LDR/STR between ARM architecture versions:
- ARMv4 and v5 do not support PC-relative addressing
- Atomic LDREX/STREX introduced in ARMv6
- 64-bit registers requires different encodings
- Scaled addressing modes vary across architectures
So code may need refactoring for compatibility across architectures.
Performance Optimization
Some tips for optimizing LDR and STR usage:
- Minimize frequent/redundant loads with registers
- Use DMA engines instead of manual loads/stores
- Optimize layout/stride for memory access patterns
- Utilize all caching options (DMA, MMU tables, etc)
- Take alignment into account
- Consider smaller data sizes (byte, halfword, doubleword)
Properly utilizing caching and memory configuration is key for optimal performance.
Multiprocessing and SMP
In SMP systems, additional considerations apply:
- Cache coherency issues – use the right barriers/cache ops
- Locking strategies – spinlocks, mutexes, etc
- Data sharing – coherent vs non coherent memory areas
- Consistency models – affects reordering rules
- Watch coherency traffic – minimize overhead
Multi-core adds concurrency issues around shared data access and coordination.
Virtual Memory
With virtual memory:
- MMU translates virtual addresses to physical
- TLB cache stores translations
- Page tables hold full translation data
- Cache flush requirements change
MMU and cache maintenance operations are needed for consistency between virtual and physical memory.
Floating Point
VFP provides floating point load/store operations: VLDR S1, [R1] ; Load single precision VSTR D1, [R2] ; Store double precision
This extends access to the FP register file. Uses same addressing modes as ARM core.
NEON SIMD
NEON has its own LDR/STR variants for SIMD vectors: VLD1.32 {D0}, [R1] ; Load single precision vector VST1.64 {D0}, [R2] ; Store double precision vector
Supports all NEON data types and composites. Enables high performance vector loads/stores.
Summary
In summary, key differences between LDR and STR:
- LDR loads memory into register, STR stores register to memory
- LDR destination is register, STR source is register
- LDR for loading data, STR for storing results
- Various addressing modes and architectures affect usage
- Optimization revolves around memory system configuration
Ensuring proper and efficient use of LDR/STR is critical for maximizing performance on ARM platforms.