The answer is yes, ARM does allow unaligned memory accesses, but with some caveats. ARM processors can perform unaligned loads and stores, but doing so comes with a performance penalty compared to aligned accesses. Additionally, some ARM instruction sets like Thumb and Thumb-2 have greater restrictions on unaligned accesses than the 32-bit ARM instruction set.
What is aligned vs unaligned memory access?
In computing, aligned memory access refers to reading from or writing to memory addresses that are multiples of the data size. For example, accessing a 4-byte integer at an address that is a multiple of 4 would be aligned. Accessing that same 4-byte integer at an address offset by 1, 2, or 3 bytes would be unaligned.
Processor architectures like x86 generally allow unaligned access without penalty, but RISC architectures like ARM prefer aligned access for performance reasons. Reading a value spanning two aligned addresses requires two separate memory accesses instead of one, impacting efficiency.
Unaligned access on ARM 32-bit architecture
The 32-bit ARM architecture and instruction set allow unaligned loads and stores for most data types. However, unaligned accesses have a performance cost compared to aligned accesses. On Cortex-A series processors, unaligned accesses take about twice as long as aligned accesses because two separate 32-bit transfers are required.
The ARM Architecture Reference Manual notes that unaligned accesses should be avoided for performance reasons. But ARM does provide mechanisms to accomplish unaligned loads and stores when required. There are some restrictions though – ARMv4 and earlier do not support unaligned word or halfword access.
Unaligned loads
To perform unaligned loads in the ARM 32-bit instruction set, the LDR instruction options can be used:
- LDR – supports unaligned byte loads
- LDRB – supports unaligned byte loads
- LDRH – supports unaligned halfword loads
- LDRSH – supports unaligned signed halfword loads
- LDRSB – supports unaligned signed byte loads
Unaligned stores
To perform unaligned stores, the STR instruction options can be used:
- STR – supports unaligned byte stores
- STRB – supports unaligned byte stores
- STRH – supports unaligned halfword stores
Unaligned access on Thumb and Thumb-2
Thumb is a 16-bit compressed subset of the ARM instruction set that improves code density. Thumb-2 extends Thumb with some 32-bit instructions. In Thumb and Thumb-2, unaligned load and store support is more limited than the 32-bit ARM instruction set:
- Only single byte loads/stores are allowed
- No support for halfword or word unaligned accesses
This means that while the ARM 32-bit ISA supports unaligned access to bytes, halfwords, and words, Thumb and Thumb-2 only handle byte unaligned accesses without faults. Halfword and word unaligned accesses will generate alignment faults.
ARMv8 and unaligned accesses
The newer 64-bit ARMv8 architecture maintains support for unaligned accesses but Alignments faults may still occur in some situations, like Speculative reads. ARM recommends avoiding unaligned accesses when possible for optimal performance.
Some key notes on unaligned accesses in ARMv8:
- All unaligned accesses have a performance penalty
- Atomic and exclusive unaligned accesses are not supported and will fault
- Some SIMD/NEON instructions don’t support unaligned
Software handling of unaligned accesses
If unaligned accesses are required, the software needs to handle them carefully. Some ways compilers and hand-written assembly can deal with unaligned accesses on ARM:
- Using the above LDR and STR instructions for explicit unaligned loads/stores
- Using intrinsics like __packed, __unaligned, __packed__ to indicate unaligned data
- Using memcpy to move unaligned data instead of direct access
- Copying data to/from an aligned buffer before accessing
- Issuing multiple aligned accesses and shifting/masking to simulate unaligned
In situations where alignment is not known at compile time, runtime checks may be needed to decide between aligned and unaligned access methods.
Performance implications
There are significant performance advantages to keeping data aligned where possible. Some benchmarks indicate unaligned 32-bit ARM accesses are 40-60% slower than aligned accesses. Thumb/Thumb-2 have the additional impact of alignment faults to handle.
Modern ARM chips do include mechanisms to mitigate the impact of unaligned accesses like load/store multiple instructions. But overall, maintaining alignment for performance sensitive code is recommended. Unaligned accesses should be minimized and isolated from critical paths if possible.
Conclusion
While ARM does allow unaligned memory access across its instruction sets, aligned access is strongly recommended for performance reasons. Unaligned accesses come with a penalty and not all instruction types support it. Software has to explicitly enable unaligned access using special instructions/intrinsics and handle any potential faults. Critical code should avoid unaligned accesses on ARM where possible, but the capability is available if needed.