Unaligned access refers to memory access where the data being accessed is not aligned to the natural alignment of the processor. For example, a 32-bit integer or float on an ARMv7 processor is naturally aligned to a 4-byte boundary. Accessing a 32-bit integer at an address not divisible by 4 would be considered unaligned access. ARMv7 processors generally support unaligned access in hardware, but it comes with a performance penalty compared to aligned access.
Why Unaligned Access Matters on ARMv7
The ARMv7 architecture is designed to work most efficiently when data access is aligned. This is due to the nature of how data is handled internally on the processor. When accessing aligned data, the processor can fetch the full data item in a single transaction. With unaligned access, it may require multiple fetches internally to gather the complete data item, resulting in higher latency and lower performance.
For example, consider accessing a 32-bit integer at address 0x10004. Since this address is not divisible by 4, it indicates an unaligned 32-bit access. To load this 32-bit integer, the processor might need to first load the 4-byte word at 0x10004, then shift and merge to extract the desired 4 bytes. This requires more operations and cycles versus an aligned load that can fetch the 4 bytes directly in one cycle.
In addition, unaligned access can cause exceptions on some ARMv7 implementations that do not support it in hardware. So code that makes unaligned accesses might work on some ARMv7 devices but fail on others. Even on devices that support unaligned access, the performance implications mean it is best avoided if possible.
Handling Unaligned Accesses in ARMv7 Code
When writing code for ARMv7, here are some ways to deal with unaligned accesses:
- Use structures and types that are naturally aligned. For example, use 4-byte aligned integers and floats instead of smaller data types.
- Pad structures so members are aligned to their natural boundaries.
- Use compiler directives to enforce alignment of variables and data structures.
- Use bytewise access methods to extract unaligned data a byte at a time.
- Use specialized unaligned load and store instructions like LDRD and STRD where supported.
- Copy misaligned data to an aligned temporary variable before access.
- Dynamically check alignment at runtime and handle unaligned accesses accordingly.
The best approach depends on if unaligned access is supported, the performance requirements, and the complexity of the code. In general, avoiding unaligned access from the start leads to the best performance portability across ARMv7 implementations.
Hardware Support for Unaligned Access
The ARMv7 architecture optionally supports unaligned access in hardware, but it is not guaranteed. Some ARMv7 implementations handle unaligned accesses transparently with hardware mechanisms like:
- Unaligned single load/store – Single load or store instructions can access unaligned data using hardware logic to gather the bytes.
- Unaligned pair load/store – Load or store pair instructions access two 32-bit values at once even if unaligned.
- hardware – Special logic detects unaligned accesses and fixes them transparently.
This hardware support reduces the performance penalty of unaligned access. But there is still an impact compared to aligned access. Some lower power or cost optimized ARMv7 devices may omit support for unaligned accesses entirely.
The ARMv7 architecture provides configuration options to enable or disable unaligned access support. Software can also query the Current Program Status Register (CPSR) to check if unaligned access is allowed at runtime.
Performance Impact of Unaligned Access
Even with hardware support, unaligned accesses incur a performance penalty on ARMv7 processors. Some examples:
- Unaligned loads take 2-3 times as many cycles as aligned loads.
- Unaligned stores take over 10 times as many cycles as aligned stores.
- Executing code from unaligned addresses can reduce performance by up to 15%.
- Branching to unaligned targets incurs a several cycle penalty.
These cycles quickly add up, especially in tight loops or routines making frequent data accesses. Code optimized for ARMv7 should align critical data structures and code segments to minimize unaligned accesses.
Some real-world benchmarks using image processing and cryptography algorithms show unaligned accesses slowing execution by 2-3x even on ARMv7 processors with full hardware support.
Dealing with Endianness
Byte order or endianness is another consideration with unaligned ARMv7 access. ARMv7 supports both big and little endian modes. When accessing unaligned multi-byte data, endian order determines which bytes are extracted from a larger fetch.
For example, loading a 16-bit halfword at an odd address would extract bytes 1 and 2 in little endian mode. The same unaligned load on a big endian system would take bytes 0 and 1 instead. So portable code cannot rely on a specific endian behavior for unaligned access.
In most cases, endianness does not cause problems for unaligned loads and stores. But unaligned access of data types larger than 1 byte may need endian conversion to match the natural order. Alternatively, bytewise access of unaligned data can be used to avoid any endian issues.
Exceptions and Undefined Behavior
The ARMv7 architecture does not fully specify or require behavior for unaligned access. This leaves some potential cases of undefined results when using unaligned loads and stores. For example:
- Unaligned access crossing a page boundary may return inconsistent data.
- Unaligned stores may corrupt adjacent data fields unexpectedly.
- Unaligned load behavior may differ for big vs little endian modes.
- Memory ordering and synchronization issues around unaligned data.
Some ARMv7 implementations detect certain unaligned access conditions and raise alignment faults or data abort exceptions. This causes unaligned access code to crash if not handled properly. There are configuration options to enable or disable signaling of these exceptions.
In general, ARMv7 unaligned access is not fully consistent or portable across implementations. Code relying on specific unaligned access behavior is not guaranteed to work as expected on all platforms. Sticking to aligned access avoids these issues.
Optimization Guidelines
Here are some key optimization guidelines for ARMv7 code related to unaligned access:
- Align performance critical data structures and code segments.
- Use naturally aligned data types instead of packed smaller types.
- Loop induction variables should have aligned address increment steps.
- Critical inner loop code is a candidate for manual alignment.
- Use aligned pair load/store instructions where possible.
- Avoid unaligned access crossing cache line boundaries.
Modern compilers can align some objects automatically. But manually aligning key data may yield better performance. Code that dynamically allocates data should also align the allocations where possible.
Conclusion
Unaligned memory access can significantly degrade performance on ARMv7 processors compared to aligned access. While ARMv7 supports unaligned access in hardware optionally, it is best avoided where possible. Writing ARMv7 code with aligned data structures and access patterns yields optimal memory throughput and execution speed.