ARM processors utilize load and store instructions to move data between the processor and memory. The two main instructions for this are LDR and STR.
LDR – Load Register
The LDR instruction loads a word or byte from memory into a register. Its general syntax is: LDR Rd, [Rn, #offset]
Where:
- Rd = Destination register where the data from memory will be loaded
- Rn = Base register containing the address in memory to load from
- Offset = Optional offset from the address in Rn to load from
For example: LDR R1, [R2, #8]
This will load the word at the address contained in R2 plus 8 bytes into R1.
The offset can be positive or negative. Omitting the offset loads from the exact address contained in Rn. There are also alternate syntaxes for scaled offsets and PC-relative addressing.
LDR supports loading bytes, halfwords, words, and doublewords into registers. The size is determined by the syntax. For example LDRB loads a byte and LDRD loads a doubleword.
Uses of LDR
LDR is used to load data from memory into registers for processing by the CPU. Some examples include:
- Loading stack variables into registers
- Loading constants or other data from rodata into registers
- Dereferencing pointers to access objects in memory
- Reading values from arrays or other data structures
- Transferring data during messaging passing or IPC
STR – Store Register
STR stores a word or byte from a register into memory. Its general syntax is: STR Rd, [Rn, #offset]
Where:
- Rd = Source register containing the data to store
- Rn = Base register containing the address in memory to store to
- Offset = Optional offset from the address in Rn to store to
For example: STR R1, [R2, #8]
This will store the word in R1 into the address contained in R2 plus 8 bytes.
Like LDR, the offset can be positive or negative. Omitting the offset stores to the exact address contained in Rn. There are also alternate syntaxes for scaled offsets and PC-relative addressing.
STR supports storing bytes, halfwords, words, and doublewords from registers. The size is determined by the syntax. For example STRB stores a byte and STRD stores a doubleword.
Uses of STR
STR is used to store data from registers back to memory. Some examples include:
- Storing stack variables back to the stack
- Storing results of calculations to memory
- Dereferencing pointers and writing objects back to memory
- Writing values into arrays or other data structures
- Transferring data during messaging passing or IPC
Differences between LDR and STR
While LDR and STR seem nearly identical, there are some key differences:
- Direction: LDR loads memory into registers, STR stores registers into memory
- Uses: LDR is used to read/load data, STR is used to write/store data
- Speed: LDRs are generally faster than STRs on modern ARM cores
- Executing: LDR executes in the Load/Store unit, STR in the Store unit
- Pipeline: LDR can be pipelined, STR stalls the pipeline until complete
- Exclusives: STR supports exclusive accesses, LDR does not
- Privileged: Some LDR/STR instructions are privileged and can only be executed in certain processor modes
In essence:
- LDR moves data from memory into the processor.
- STR moves data from the processor into memory.
LDR loads data to be operated on by the CPU. STR stores the results back to memory. Both are critical instructions for ARM’s load/store architecture.
LDR Details and Examples
Here are some more specifics on how LDR works and is used in ARM programming:
Addressing Modes
LDR supports several different addressing modes:
- Offset: [Rn, #offset] – Add offset to base address in Rn
- Pre-indexed: [Rn, #offset]! – Add offset, load, update Rn with new address
- Post-indexed: [Rn], #offset – Load, then add offset to Rn
- Pre-indexed w/ writeback: [Rn, #offset]! – Add offset, load, write updated address back to Rn
This provides flexibility when accessing arrays, pointers, and other structured data.
Load Sizes
LDR can load 8-bit bytes, 16-bit halfwords, 32-bit words, or 64-bit doublewords. The instruction varies slightly: LDRB Rd, [Rn, #offset] ; Load byte LDRH Rd, [Rn, #offset] ; Load halfword LDR Rd, [Rn, #offset] ; Load word LDRD Rd, Rd2, [Rn, #offset] ; Load doubleword into Rd and Rd2
The appropriate size load should be used for the data being accessed to avoid truncation or invalid alignment.
Operand 2 Options
Instead of a base address register, LDR can use other operands for the second parameter: LDR Rd, [PC, #offset] ; PC-relative addressing LDR Rd, =constant ; Load address of constant into Rd LDR Rd, [SP, #offset] ; Load from stack
PC-relative addressing uses the PC value and an offset to compute the address. This is useful for loading constants andAddresses relative to the current PC.
Examples
Here are some examples of LDR used to load data: LDR R1, [R2, #8] ; Load word from R2 + 8 LDR R3, =0x12345678 ; Load address of constant 0x12345678 LDR R0, [R5], #4 ; Post-indexed load from R5 and increment R5 by 4 LDRB R4, [SP, #-8] ; Load byte from stack offset -8 LDR R7, [PC, #128] ; PC-relative load
STR Details and Examples
Here are some details on how STR works:
Addressing Modes
STR supports the same offset, pre-indexed, post-indexed, and pre+writeback addressing modes as LDR.
Store Sizes
STR can store 8-bit bytes, 16-bit halfwords, 32-bit words, and 64-bit doublewords. The instruction varies slightly: STRB Rd, [Rn, #offset] ; Store byte STRH Rd, [Rn, #offset] ; Store halfword STR Rd, [Rn, #offset] ; Store word STRD Rd, Rd2, [Rn, #offset] ; Store doubleword from Rd and Rd2
The appropriate size store should be used for the data being written to match the memory allocation.
Exclusive Access
STR supports exclusive access instructions like STREX. These perform the store only if no other processor has accessed the memory since it was read. STREX Rd, Rm, [Rn] ; Perform exclusive store
This enables atomic read-modify-write sequences for multithreaded programming.
Examples
STR R1, [R2, #8] ; Store R1 into R2 + 8 STR R3, [R5], #4 ; Post-indexed STR into R5 then increment R5 by 4 STRB R0, [R4, #-8] ; Store byte into stack offset -8 STREX R5, R6, [R8] ; Try exclusive store of R6 into R8
LDR vs STR Performance
On most ARM processors, LDR instructions generally perform better than STRs. There are several reasons for this:
- LDRs can be pipelined while STRs stall waiting to write back data
- Many CPUs can issue more LDRs per cycle than STRs
- Write buffers improve throughput for burst LDRs
- STRs require exclusive access checks which can slow them down
- Memory writes consume more power and are slower than reads
For example, Cortex-A72 can sustain 4 LDRs per cycle but only 1 STR. LDR throughput is usually 2-3x higher than STRs.
When optimizing critical code, it’s better to structure loops to favor LDRs over STRs where possible.
LDR and STR Usage Guidelines
Here are some guidelines for optimal usage of LDR and STR:
- Use the appropriate size LDR/STR for the data – don’t load bytes into a word register if you can avoid it
- Avoid unnecessary loads/stores – try to keep data in registers instead
- If possible, structure loops and algorithms to favor LDRs over STRs
- Use pre-increment and post-increment addressing modes to update pointers in loops cleanly
- Take advantage of write buffers by grouping stores together where possible
- Use aligned word and doubleword accesses for efficiency – don’t read single unaligned bytes
Following these guidelines will ensure efficient data movement between the processor and memory.
Conclusion
In summary, LDR and STR are the core instructions for loading data from memory into registers and storing it back on ARM processors. LDR moves data from memory into the processor while STR moves data from the processor to memory. LDR is generally faster and can be pipelined, while STR stalls the pipeline until the store is completed.
Using the appropriate addressing modes and data sizes for LDR and STR is critical to writing efficient ARM assembly code. Optimizing the balance of loads vs stores and leveraging pre/post-increment modes can greatly improve performance. These basics apply equally to both ARM assembly programming as well as compiling higher level languages for ARM.