Accessing data from memory is a fundamental operation in computing. When a processor needs to read or write data, it sends a request to the memory subsystem, specifying an address to access. Whether this address is aligned or unaligned can have significant performance implications.
What is memory alignment?
Memory alignment refers to the positioning of data at memory addresses that are multiples of the data size. For example, a 4-byte integer would be aligned in memory if its address is a multiple of 4. This ensures that the integer lies on a 4-byte boundary.
Processor architectures have alignment requirements for data types. Reading naturally aligned data allows the processor to access the value in a single transaction. However, unaligned access requires multiple transactions, negatively impacting performance.
Impacts of unaligned access
When data is not aligned to its natural boundary, the processor cannot simply read or write the value in one go. Consider a 4-byte integer starting at address 0x1005. To read this unaligned integer, the processor must:
- Perform a read transaction at address 0x1004 to get the first byte
- Perform a read at 0x1005 to get the second byte
- Perform a read at 0x1006 to get the third byte
- Perform a read at 0x1007 to get the fourth byte
It must then assemble these 4 bytes to reconstruct the full integer value. This requires additional instructions and cycles versus an aligned read which gets the full value in one transaction.
Writes have the same issue. The processor must perform one write transaction per byte to store the unaligned integer. This quadruples the work compared to a naturally aligned write.
In addition to requiring multiple transactions, unaligned accesses can cause processor exceptions. Some architectures do not support unaligned access at all and will raise an exception if asked to do so.
Performance impact
Unaligned data access can significantly degrade performance:
- Each unaligned read or write requires multiple transactions instead of one. This increases bus utilization and memory contention.
- Extra instructions are needed to reconstruct unaligned values from individual bytes/words. This increases instruction overhead.
- Unaligned stores may need read-modify-write cycles. The memory location must be read before being partially overwritten.
- Processor exceptions caused by unaligned accesses incur large cycle penalties to handle.
As a result, aligning data properly can provide substantial speed benefits. Experiments show that aligned access can be up to 2-3x faster than unaligned access for some data types and architectures.
Causes of unaligned access
There are several potential causes of unaligned memory access:
- Improperly packed data structures – Structures/arrays with gaps between fields or elements can lead to unaligned offsets.
- Type punning – Casting memory containing one type to another (e.g. int* to char*) exposes unaligned offsets.
- Pointer arithmetic – Incrementing a pointer by anything other than the data size misaligns it.
- Dynamically allocated memory – Heap allocators typically don’t align allocations to anything larger than the biggest scalar (e.g. 8 bytes).
- Unoptimized code – Compilers may generate unaligned access unless optimization is enabled.
Avoiding unaligned access
There are techniques both at the coding level and compiler/architecture level to avoid the pitfalls of unaligned access:
- Use #pragmas to instruct the compiler to naturally align data structures and access fields sequentially.
- Allocate data dynamically using functions that align to a specific boundary.
- Avoid type punning and reinterpret casts that alias differently sized data types.
- Use pointer arithmetic carefully to increment by the proper stride.
- Choose data types that fit evenly into larger types to avoid gaps.
- Enable compiler optimizations like -falign-functions and -falign-labels.
- Architectures may support unaligned semantics (e.g. MIPS) or align values transparently.
Handling unavoidable unaligned access
In rare cases, unaligned access may be unavoidable. Some techniques to handle this include:
- Use intrinsic functions like __packedload()/__packedstore() to let the compiler generate optimal unaligned access code.
- Manually load bytes and piece together values using shifts and masks.
- Copy the unaligned data to an aligned temporary variable before accessing it.
- Align the source pointer before accessing the data rather than after.
Examples of aligned vs unaligned access
Here are some examples in C/C++ showing the difference between aligned and unaligned memory access:
// Aligned access
// Integer pointer p is 32-bit aligned
int* p = (int*)malloc(sizeof(int));
*p = 123; // Single aligned write
int x = *p; // Single aligned read
// Unaligned access
// Character pointer c may not be 32-bit aligned
char* c = malloc(sizeof(int));
// Write integer 32-bits to unaligned c
*(int*)c = 123; // Requires 4 byte writes
// Read integer 32-bits from unaligned c
int x = *(int*)c; // Requires 4 byte reads
This simple case shows how casting an unaligned pointer to a larger data type can generate unaligned accesses. The compiler will likely inline multiple byte-sized reads and writes for the unaligned integer pointers, whereas the aligned version generates simple aligned read and write instructions.
Summary
Aligned memory access improves performance by enabling single transactions to read/write data. Unaligned access requires multiple transactions and instructions to piece data together, hurting speed. Data structures and pointers should be properly aligned to memory boundaries matching their type size. Unaligned access is sometimes unavoidable, but techniques exist to mitigate the performance impact. Aligning data eliminates unnecessary memory operations, making code faster and more efficient.