An unaligned access error can occur in Arm Cortex-M microcontrollers when trying to access data at an address that is not aligned to the data’s size. For example, trying to access a 32-bit integer at an address that is not divisible by 4 bytes would result in an unaligned access. This is due to the Cortex-M core expecting aligned accesses for better performance. While unaligned accesses are supported in hardware, they come with a performance penalty that can be avoided by properly aligning data.
What Causes Unaligned Accesses?
There are a few common causes of unaligned accesses in Cortex-M programs:
- Accessing fields within a packed struct where the fields are not aligned
- Casting an integer pointer to a pointer of a larger data type, like uint32_t* to uint8_t*, and dereferencing it
- Allocating data buffers on the heap without aligning them first
- Reading data from an external interface that does not guarantee alignment, like a serial port
In each of these cases, the developer must take care to avoid unaligned accesses, either by packing structs carefully, aligning pointers and buffers, or handling unaligned external data properly. The Cortex-M hardware will not generate faults on unaligned accesses by default, so it is up to the developer to avoid them.
Effects of Unaligned Accesses
When an unaligned access occurs, the Cortex-M core must perform multiple aligned accesses and shifts to complete the full unaligned access. This requires additional clock cycles to complete. So the direct effect is reduced performance for unaligned accesses. For example, a 32-bit read could require two 16-bit aligned reads, with shifts to assemble the full 32-bit value. This can lower performance significantly in code with many unaligned accesses.
The other effect of unaligned accesses relates to interrupts. The Cortex-M core does not allow interrupts to be taken until the full unaligned access completes. So if a long unaligned access is being performed, interrupts will be blocked for many clock cycles until it finishes. This can cause unexpected interrupt latency in a system.
Lastly, unaligned accesses that cross memory boundaries may result in bus faults or hard faults on Cortex-M0/M0+ cores that do not include the unaligned access support present in Cortex-M3 and above. So unaligned accesses can potentially cause crashes on these simpler Cortex-M variants.
Checking for Unaligned Accesses
The Cortex-M core provides a couple ways to check for unaligned accesses in software:
- Configuration of unaligned access trap handlers
- Checking the CCR.UNALIGN_TRP bit after each unaligned access
The Memory Protection Unit (MPU) in Cortex-M3 and above can also be configured to generate a fault on unaligned accesses, which will trap to the MemManage handler similar to an MPU fault.
Unaligned Access Trap Handlers
In Cortex-M3 and above, several trap handlers can be configured to intercept unaligned accesses and handle them in software. This includes:
- UsageFault handler for unaligned halfword and word accesses
- BusFault handler for unaligned accesses that cross a bus boundary
- MemManage handler for MPU faults on unaligned accesses
The fault status registers can be checked in these handlers to identify an unaligned access. Then the access can be emulated with aligned accesses in the handler code. This prevents unaligned accesses from ever reaching the memory system.
CCR.UNALIGN_TRP Bit
The Cortex-M Cultural Control Register (CCR) contains a bit called UNALIGN_TRP that gets set on any unaligned halfword or word access. This can be checked after any suspect access to see if it was unaligned. For example: uint32_t val; ((uint8_t*)ptr)[0] = 0x12; // Suspect unaligned access if (SCB->CCR & SCB_CCR_UNALIGN_TRP) { // Handle unaligned access } val = *(uint32_t*)ptr; // Suspect unaligned access if (SCB->CCR & SCB_CCR_UNALIGN_TRP) { // Handle unaligned access }
This allows unaligned accesses to be identified explicitly. The bit must be cleared manually before continuing.
Fixing Unaligned Accesses
There are a few general strategies to fix unaligned accesses in Cortex-M code:
- UseCompiler pragmas to pack structs carefully to align fields
- Manually align pointers before dereferencing them
- Align heap allocated buffers before use
- Handle or convert unaligned external data as needed
- Use unaligned trap handlers to emulate accesses
- Use compiler intrinsics for unaligned accesses
Packing Structs Properly
One common cause of unaligned accesses is improperly packed structs. Using #pragma directives, structs can be packed in an aligned fashion. For example: #pragma pack(push, 1) struct packed_struct { uint8_t a; uint32_t b; uint16_t c; }; #pragma pack(pop)
This packs the struct with 1 byte alignment but maintains 4 byte alignment for the 32-bit field. The compiler intrinsics __packed and __aligned can also align fields properly.
Pointer Alignment
Pointers should be aligned before dereferencing them as a larger type. This can be done manually with casting or alignment intrinsics. For example: uint32_t *p = (uint32_t*)((uintptr_t)ptr & ~3); // Align to 4 bytes uint32_t val = *p; // Aligned 32-bit access
Heap Buffer Alignment
Any heap allocated buffers should be aligned to the largest access size that will be used. The common C library function memalign() can allocate aligned buffers. For example: uint32_t *buffer = memalign(4, 1024); // Aligned to 4 bytes
Handling External Unaligned Data
Data from external interfaces like peripheral registers or serial ports may not be aligned. Special handling may be needed to align this data before use. For example byte-by-byte copying into an aligned buffer. Or using unaligned access intrinsics: uint32_t val; __unaligned_uint32_read(val, ptr); // Unaligned 32-bit read intrinsic
Unaligned Access Trap Handlers
As mentioned previously, unaligned access trap handlers can be used to intercept all unaligned accesses and emulate them with aligned accesses. This prevents any unaligned access from affecting the memory system.
Compiler Intrinsics
Built-in compiler intrinsics can perform unaligned accesses safely and efficiently. For example GCC and Clang provide __unaligned_uint32_t* types and read/write intrinsics. These use optimal unaligned access instructions where supported by the Cortex-M core. So intrinsics provide the best performance for necessary unaligned accesses.
Tools for Detecting Unaligned Accesses
There are a few options available for analyzing code and detecting unaligned accesses:
- Compiler warnings – Enable unaligned access warnings during compilation
- Static analysis – Tools like Polyspace, Coverity, LDRA can detect unaligned access defects
- Debuggers – Debugger watchpoints can break on unaligned access trap handlers
- Profilers – Tools like SEGGER Ozone can profile unaligned accesses
- Bus analyzers – Monitor bus transactions and detect unaligned transfers
Enabling compiler warnings for unaligned access is the easiest first step during code development. Static analysis and debugging unaligned trap handlers also helps catch issues during integration. Bus analyzers and profilers help validate overall system performance by monitoring for unaligned transfers during operation.
Performance Impact
The processor cycles required for unaligned accesses depends on the Cortex-M variant and access type. Typical examples for Cortex-M3/M4/M7:
- Unaligned byte access – 1 cycle
- Unaligned halfword access – 2 cycles
- Unaligned word access – 4 cycles
So a single unaligned word access requires 4 extra cycles versus an aligned access. Frequent unaligned accesses can quickly reduce overall performance. Some benchmarks indicate 2-3% performance loss per 5% of accesses being unaligned as a rough estimate. Eliminating unaligned accesses is one of the top optimizations to improve performance on Cortex-M cores.
Effects on Interrupt Latency
Unaligned accesses also increase interrupt latency on Cortex-M cores. The processor blocks servicing of interrupts until the full unaligned access completes. For example, a sequence of unaligned word accesses could add up to 20+ cycles of extra interrupt latency. This can cause issues for interrupting time-sensitive operations.
Measuring maximum interrupt latency under cases of frequent unaligned accesses can help quantify the impact. Keeping unaligned accesses to a minimum helps avoid degrading interrupt response times in a Cortex-M system.
Conclusion
In summary, unaligned accesses are a common source of performance issues on Arm Cortex-M cores. They lower efficiency by requiring extra access cycles and increase interrupt latency. Unaligned accesses should be avoided by packing structs properly, aligning buffers and pointers, handling external data carefully, and using compiler intrinsics when needed. Tools are available to detect unaligned access defects during development. And the processor provides mechanisms for trapping on unaligned accesses when needed. Following the best practices outlined here will help eliminate unaligned access penalties in Cortex-M designs.