SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: armv8 unaligned access
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

armv8 unaligned access

Javier Massey
Last updated: September 14, 2023 12:10 am
Javier Massey 6 Min Read
Share
SHARE

Unaligned memory accesses refer to accessing data at memory addresses that are not multiples of the data size. For example, accessing a 4-byte integer at address 0x1003 is an unaligned access because the address is not a multiple of 4 bytes. ARMv8 processors handle unaligned accesses differently than previous ARM architectures.

Contents
Loads and StoresAtomic and Exclusive AccessesFloating Point LoadsSIMD Loads and StoresInstruction FetchesTLB MappingsUnaligned FaultsPerformance ImpactCompiler HandlingSummary

In ARMv7 and earlier, unaligned accesses were generally unsupported and would result in an alignment fault exception. This required software to explicitly handle unaligned accesses by aligning the data before accessing it. However, supporting unaligned accesses in software has a performance cost.

ARMv8 takes a different approach and supports unaligned accesses directly in hardware. This removes the software overhead of aligning data. However, there are still some caveats to keep in mind with ARMv8 unaligned accesses:

Loads and Stores

ARMv8 allows unaligned loads and stores for all data types. For example, a 4-byte integer load from address 0x1003 will be performed as a single unaligned 4-byte access. However, performance is optimal when data accesses are aligned.

Unaligned loads may cross cache line boundaries and result in more than one cache line being read. This can reduce performance compared to an aligned load within a single cache line. Unaligned stores also have a performance penalty if they cross cache line boundaries.

Atomic and Exclusive Accesses

ARMv8 requires exclusive and atomic memory accesses, such as load-exclusive/store-exclusive, to be naturally aligned. Unaligned exclusive or atomic accesses will fault. This maintains expected atomicity and exclusivity semantics.

Floating Point Loads

ARMv8 allows unaligned loads of 32-bit and 64-bit floating point data. However, it does not allow unaligned 128-bit floating point loads. A 128-bit floating point load, such as for a {double, double} vector, must be 16-byte aligned. An unaligned 128-bit floating point load will fault.

SIMD Loads and Stores

SIMD loads and stores support unaligned access in ARMv8. For example, a SIMD vector load or store can start at an arbitrary byte address. However, performance is optimal when SIMD data is aligned to its natural alignment.

SIMD loads and stores may cross cache line or page boundaries and be split into multiple separate accesses. Accessing SIMD data that crosses these boundaries will impact performance. Aligning SIMD data to cache line and page boundaries can improve performance.

Instruction Fetches

ARMv8 requires instruction fetches to be aligned. Instruction addresses must be 4-byte aligned otherwise an alignment fault will occur. Jump targets and branch destinations must also be aligned. This avoids complex logic to handle unaligned instruction fetches.

TLB Mappings

ARMv8 translates virtual addresses to physical addresses via the Translation Lookaside Buffer (TLB). The minimum granularity is 4KB pages, meaning virtual addresses are mapped to 4KB aligned physical addresses.

If an unaligned access crosses a 4KB page boundary, it results in accesses to two separate physical pages. This requires two TLB lookups instead of one, hurting performance. Aligning data to 4KB page boundaries can avoid this.

Unaligned Faults

Even though ARMv8 supports unaligned accesses, there are cases where an unaligned access may still fault:

  • Atomic or exclusive accesses must be aligned
  • 128-bit FP loads must be 16-byte aligned
  • Instruction fetches must be 4-byte aligned
  • An access that crosses a region with different memory attributes or permissions may fault

If a fault occurs, it will generate an Alignment Fault exception. The faulting address will be captured in the Fault Address Register (FAR). Software must handle the alignment fault and emulate the required unaligned behavior if needed.

Performance Impact

Allowing unaligned accesses avoids software overhead to align data. However, unaligned accesses can still hurt performance in certain cases:

  • Unaligned loads/stores may cross cache line boundaries and reduce cache efficiency
  • Unaligned SIMD accesses may cross cache or page boundaries, requiring multiple separate memory accesses
  • Unaligned accesses may require two TLB lookups instead of one if crossing 4KB page boundaries

In performance sensitive code, aligning data and accesses to match the access size, cache lines, pages, and other architecture features will provide optimal performance. Unaligned accesses should be avoided where possible in hot code paths.

Compiler Handling

Compilers can generate both aligned and unaligned accesses depending on context. For load/store intrinsics like ldur/stur, the compiler will handle alignment based on the address expression.

For SIMD intrinsics, the compiler may generate an unaligned access or use inline logic to emulate an unaligned access using aligned vector loads/stores. This is transparent to the programmer.

The compiler may also automatically generate unaligned accesses in cases it determines there is no performance cost. For example, scalar integer loads within a loop may be unaligned across iterations to improve code density.

Summary

ARMv8 broadly supports unaligned accesses in hardware, avoiding software alignment overhead. However, performance is optimal when memory accesses match the alignment of architecture features like cache lines. Unaligned vector and SIMD accesses in particular can hurt performance.

Compilers mitigate unaligned access performance issues in most cases. But for performance critical software, aligning data and accesses manually can help. Watch for unaligned faults on atomic/exclusive accesses and 128-bit FP loads. Handle faults gracefully and emulate unaligned behavior if needed.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Cortex M4 Unaligned Access
Next Article Unaligned Access Error
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

What are the disadvantages of ARM processors?

ARM processors have become ubiquitous in mobile devices and embedded…

9 Min Read

Is the ARM Cortex M3 a processor or controller?

The ARM Cortex M3 is primarily considered a microcontroller, which…

7 Min Read

Best practices for Cortex-M1 MMI generation in Xilinx FPGAs

Generating the Memory Mapped Interface (MMI) for a Cortex-M1 processor…

4 Min Read

RTL simulation for designStart Cortex-M0, M3 and M4

RTL (Register Transfer Level) simulation allows designers to verify the…

7 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account