Unaligned access error in Arm Cortex-M

An unaligned access error can occur in Arm Cortex-M microcontrollers when trying to access data at an address that is not aligned to the data’s size. For example, trying to access a 32-bit integer at an address that is not divisible by 4 bytes would result in an unaligned access. This is due to the Cortex-M core expecting aligned accesses for better performance. While unaligned accesses are supported in hardware, they come with a performance penalty that can be avoided by properly aligning data.

Contents

What Causes Unaligned Accesses?Effects of Unaligned Accesses Checking for Unaligned Accesses Unaligned Access Trap Handlers CCR.UNALIGN_TRP Bit Fixing Unaligned Accesses Packing Structs Properly Pointer Alignment Heap Buffer Alignment Handling External Unaligned Data Unaligned Access Trap Handlers Compiler Intrinsics Tools for Detecting Unaligned Accesses Performance Impact Effects on Interrupt Latency Conclusion

What Causes Unaligned Accesses?

There are a few common causes of unaligned accesses in Cortex-M programs:

Accessing fields within a packed struct where the fields are not aligned

Casting an integer pointer to a pointer of a larger data type, like uint32_t* to uint8_t*, and dereferencing it
Allocating data buffers on the heap without aligning them first
Reading data from an external interface that does not guarantee alignment, like a serial port

In each of these cases, the developer must take care to avoid unaligned accesses, either by packing structs carefully, aligning pointers and buffers, or handling unaligned external data properly. The Cortex-M hardware will not generate faults on unaligned accesses by default, so it is up to the developer to avoid them.

Effects of Unaligned Accesses

When an unaligned access occurs, the Cortex-M core must perform multiple aligned accesses and shifts to complete the full unaligned access. This requires additional clock cycles to complete. So the direct effect is reduced performance for unaligned accesses. For example, a 32-bit read could require two 16-bit aligned reads, with shifts to assemble the full 32-bit value. This can lower performance significantly in code with many unaligned accesses.

The other effect of unaligned accesses relates to interrupts. The Cortex-M core does not allow interrupts to be taken until the full unaligned access completes. So if a long unaligned access is being performed, interrupts will be blocked for many clock cycles until it finishes. This can cause unexpected interrupt latency in a system.

Lastly, unaligned accesses that cross memory boundaries may result in bus faults or hard faults on Cortex-M0/M0+ cores that do not include the unaligned access support present in Cortex-M3 and above. So unaligned accesses can potentially cause crashes on these simpler Cortex-M variants.

Checking for Unaligned Accesses

The Cortex-M core provides a couple ways to check for unaligned accesses in software:

Configuration of unaligned access trap handlers

Checking the CCR.UNALIGN_TRP bit after each unaligned access

The Memory Protection Unit (MPU) in Cortex-M3 and above can also be configured to generate a fault on unaligned accesses, which will trap to the MemManage handler similar to an MPU fault.

Unaligned Access Trap Handlers

In Cortex-M3 and above, several trap handlers can be configured to intercept unaligned accesses and handle them in software. This includes:

UsageFault handler for unaligned halfword and word accesses
BusFault handler for unaligned accesses that cross a bus boundary
MemManage handler for MPU faults on unaligned accesses

The fault status registers can be checked in these handlers to identify an unaligned access. Then the access can be emulated with aligned accesses in the handler code. This prevents unaligned accesses from ever reaching the memory system.

CCR.UNALIGN_TRP Bit

The Cortex-M Cultural Control Register (CCR) contains a bit called UNALIGN_TRP that gets set on any unaligned halfword or word access. This can be checked after any suspect access to see if it was unaligned. For example: uint32_t val; ((uint8_t*)ptr)[0] = 0x12; // Suspect unaligned access if (SCB->CCR & SCB_CCR_UNALIGN_TRP) { // Handle unaligned access } val = *(uint32_t*)ptr; // Suspect unaligned access if (SCB->CCR & SCB_CCR_UNALIGN_TRP) { // Handle unaligned access }

This allows unaligned accesses to be identified explicitly. The bit must be cleared manually before continuing.

Fixing Unaligned Accesses

There are a few general strategies to fix unaligned accesses in Cortex-M code:

UseCompiler pragmas to pack structs carefully to align fields
Manually align pointers before dereferencing them

Align heap allocated buffers before use
Handle or convert unaligned external data as needed
Use unaligned trap handlers to emulate accesses

Use compiler intrinsics for unaligned accesses

Packing Structs Properly

One common cause of unaligned accesses is improperly packed structs. Using #pragma directives, structs can be packed in an aligned fashion. For example: #pragma pack(push, 1) struct packed_struct { uint8_t a; uint32_t b; uint16_t c; }; #pragma pack(pop)

This packs the struct with 1 byte alignment but maintains 4 byte alignment for the 32-bit field. The compiler intrinsics __packed and __aligned can also align fields properly.

Pointer Alignment

Pointers should be aligned before dereferencing them as a larger type. This can be done manually with casting or alignment intrinsics. For example: uint32_t *p = (uint32_t*)((uintptr_t)ptr & ~3); // Align to 4 bytes uint32_t val = *p; // Aligned 32-bit access

Heap Buffer Alignment

Any heap allocated buffers should be aligned to the largest access size that will be used. The common C library function memalign() can allocate aligned buffers. For example: uint32_t *buffer = memalign(4, 1024); // Aligned to 4 bytes

Handling External Unaligned Data

Data from external interfaces like peripheral registers or serial ports may not be aligned. Special handling may be needed to align this data before use. For example byte-by-byte copying into an aligned buffer. Or using unaligned access intrinsics: uint32_t val; __unaligned_uint32_read(val, ptr); // Unaligned 32-bit read intrinsic

Unaligned Access Trap Handlers

As mentioned previously, unaligned access trap handlers can be used to intercept all unaligned accesses and emulate them with aligned accesses. This prevents any unaligned access from affecting the memory system.

Compiler Intrinsics

Built-in compiler intrinsics can perform unaligned accesses safely and efficiently. For example GCC and Clang provide __unaligned_uint32_t* types and read/write intrinsics. These use optimal unaligned access instructions where supported by the Cortex-M core. So intrinsics provide the best performance for necessary unaligned accesses.

Tools for Detecting Unaligned Accesses

There are a few options available for analyzing code and detecting unaligned accesses:

Compiler warnings – Enable unaligned access warnings during compilation
Static analysis – Tools like Polyspace, Coverity, LDRA can detect unaligned access defects
Debuggers – Debugger watchpoints can break on unaligned access trap handlers

Profilers – Tools like SEGGER Ozone can profile unaligned accesses
Bus analyzers – Monitor bus transactions and detect unaligned transfers

Enabling compiler warnings for unaligned access is the easiest first step during code development. Static analysis and debugging unaligned trap handlers also helps catch issues during integration. Bus analyzers and profilers help validate overall system performance by monitoring for unaligned transfers during operation.

Performance Impact

The processor cycles required for unaligned accesses depends on the Cortex-M variant and access type. Typical examples for Cortex-M3/M4/M7:

Unaligned byte access – 1 cycle
Unaligned halfword access – 2 cycles

Unaligned word access – 4 cycles

So a single unaligned word access requires 4 extra cycles versus an aligned access. Frequent unaligned accesses can quickly reduce overall performance. Some benchmarks indicate 2-3% performance loss per 5% of accesses being unaligned as a rough estimate. Eliminating unaligned accesses is one of the top optimizations to improve performance on Cortex-M cores.

Effects on Interrupt Latency

Unaligned accesses also increase interrupt latency on Cortex-M cores. The processor blocks servicing of interrupts until the full unaligned access completes. For example, a sequence of unaligned word accesses could add up to 20+ cycles of extra interrupt latency. This can cause issues for interrupting time-sensitive operations.

Measuring maximum interrupt latency under cases of frequent unaligned accesses can help quantify the impact. Keeping unaligned accesses to a minimum helps avoid degrading interrupt response times in a Cortex-M system.

Conclusion

In summary, unaligned accesses are a common source of performance issues on Arm Cortex-M cores. They lower efficiency by requiring extra access cycles and increase interrupt latency. Unaligned accesses should be avoided by packing structs properly, aligning buffers and pointers, handling external data carefully, and using compiler intrinsics when needed. Tools are available to detect unaligned access defects during development. And the processor provides mechanisms for trapping on unaligned accesses when needed. Following the best practices outlined here will help eliminate unaligned access penalties in Cortex-M designs.

Unaligned access error in Arm Cortex-M

What Causes Unaligned Accesses?

Effects of Unaligned Accesses

Checking for Unaligned Accesses

Unaligned Access Trap Handlers

CCR.UNALIGN_TRP Bit

Fixing Unaligned Accesses

Packing Structs Properly

Pointer Alignment

Heap Buffer Alignment

Handling External Unaligned Data

Unaligned Access Trap Handlers

Compiler Intrinsics

Tools for Detecting Unaligned Accesses

Performance Impact

Effects on Interrupt Latency

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Options for Exporting Code Coverage Results from Keil μVision

Cortex-A76 architecture and specifications (Explained)

How to get QEMU to run an ARM Thumb binary?

ipsr register cortex-m4