How Unaligned Memory Access is Handled in ARM Cortex-M4

ARM Cortex-M4 microcontrollers have built-in support for unaligned memory access, allowing data to be accessed from memory addresses that are not aligned to the size of the data type. This provides flexibility for programmers and helps reduce code size by eliminating the need for explicit alignment in some situations.

Contents

What is Unaligned Memory Access?Why Unaligned Accesses Occur Handling Unaligned Accesses Unaligned Data Load/Store Unaligned Single Load/Store Hardware Decomposing Unaligned Access Exceptions Performance of Unaligned Accesses Using Unaligned Accesses in Code Effects on Code Density Use of Unaligned Accesses by Compilers Conclusion

What is Unaligned Memory Access?

Unaligned memory access refers to reading or writing data types like shorts, ints, longs etc from memory addresses that are not evenly divisible by the size of the data type. For example, accessing a 4-byte integer from address 0x1003 instead of a 4-byte aligned address like 0x1000. This is in contrast to aligned access where data types are only accessed from addresses that are multiples of their size.

Most processors require data types to be aligned for efficient access. ARM Cortex-M4 provides hardware support for unaligned accesses, removing this requirement in software.

Why Unaligned Accesses Occur

There are several common situations where unaligned memory access occurs:

Accessing fields within packed structs: Since each field starts right after the previous one, they are often unaligned.
Typecasting pointers to other data types: The pointer may not be aligned to the new data type.

Accessing external data formats like network packets or filesystems where alignment is not guaranteed.
Overlaying structs on top of contiguous buffers.

Requiring aligned access in these situations increases code size from extra padding and alignment checks. ARM Cortex-M4’s built-in support avoids this.

Handling Unaligned Accesses

The ARMv7-M architecture that Cortex-M4 uses provides hardware mechanisms to support unaligned accesses efficiently:

Unaligned Data Load/Store

The LDM, STM, LDR, STR instructions used for memory access have variants like LDRB, LDRH, etc for 8-bit, 16-bit datatypes. These variants have an option to handle unaligned addresses automatically.

For example: LDRH R1, [R2] ; 16-bit aligned load LDRSH R1, [R2] ; 16-bit unaligned load

The SH (unaligned) versions efficiently load data from any 2-byte address into the register.

Unaligned Single Load/Store

Single data loads like LDR and STR also support unaligned access using the B, H, SB, SH postfix: LDR R1, [R2] ; 32-bit aligned load LDRH R1, [R2] ; 16-bit unaligned load

This performs an unaligned load of a 16-bit halfword from any address in R2.

Hardware Decomposing

For unaligned loads, the Cortex-M4 hardware automatically decomposes them into separate aligned loads. For example: 0x1000: 0x12 0x34 0x56 0x78 +- 32-bit int -+ Unaligned LDR at 0x1002: 1. Load lower 2 bytes (0x34 0x56) 2. Load upper 2 bytes (0x12 0x78) 3. Combine together

The decomposing is done transparently in hardware for any unaligned access.

Unaligned Access Exceptions

Generally unaligned accesses work efficiently on Cortex-M4. But in some cases like loading a 32-bit int from a non-word address, it may trap and raise a usage fault exception. This is configurable via the CCR.UNALIGN_TRP bit. CCR.UNALIGN_TRP = 0; // No trap (default) CCR.UNALIGN_TRP = 1; // Trap unaligned

With trap enabled, unaligned LDR/STR will trigger UsageFault_IRA event, allowing handlers to be written for specific unaligned cases.

Performance of Unaligned Accesses

Unaligned accesses on Cortex-M4 generally do not affect performance, thanks to the hardware decomposition mechanisms. However, there are some scenarios where aligned access may be faster:

Sequential access: Aligned accesses avoid decomposing penalty

Flash access: Writing flash is faster when aligned
External bus: Unaligned may need multiple bus transfers

So while unaligned access support removes the need for explicit alignment in code, aligning where possible can provide a small performance boost.

Using Unaligned Accesses in Code

Here are some ways unaligned access support can be leveraged during programming:

Avoid casting pointers to integers for alignment checks
Use packed structs for memory efficiency rather than aligning fields

Overlay buffers without worrying about alignment
No need to align external data before accessing
Fetch data types from unaligned addresses directly

Some examples: // Packed struct struct __packed { uint8_t len; uint32_t addr; } pkt; // Overlay buffer uint8_t buf[10]; uint16_t *data = (uint16_t *)buf; // No alignment needed process_data(rx_buff);

So unaligned access support directly helps reduce code size and complexity in many common situations.

Effects on Code Density

A key benefit of unaligned support is reducing code size by avoiding explicit alignment in code. Here are some common code patterns that are no longer needed with Cortex-M4:

Pointer casts and checks for alignment
Aligning stack variables
Padding structs

Intermediate copying of data to align
Specialized aligned and unaligned versions of functions

For example, this code fragment aligns a buffer before accessing it: uint32_t buf[10]; void *align_ptr = (void *)(((uint32_t)buf + 3) & ~3); uint32_t *p = (uint32_t *)align_ptr; *p = 0x12345678;

With unaligned support, this simply becomes: uint32_t buf[10]; buf[0] = 0x12345678;

By removing such alignment handling code, overall code density improves.

Use of Unaligned Accesses by Compilers

Modern ARM compilers like armclang and gcc can automatically generate unaligned accesses when beneficial. For example: struct s { uint8_t len; uint32_t addr; } pkt; pkt.addr = 0x12345; // Compiled as unaligned access

Compilers may also use inline assembly with LDRH, LDRSH to utilize unaligned transfers. Flags like gcc’s -munaligned-access force unaligned generation. So compiling with these compilers allows getting code density benefits without changing code.

Conclusion

The ARM Cortex-M4 microarchitecture directly supports unaligned accesses for loads, stores and memory copy. This avoids the need for manual alignment handling in code, improving code density. Hardware decomposing provides efficient unaligned data access without impacting performance in most cases. Compiler utilization of unaligned instructions also automatically improves code density. Overall, unaligned access support directly helps reduce code size and complexity for ARM Cortex-M4-based microcontrollers.

How Unaligned Memory Access is Handled in ARM Cortex-M4

What is Unaligned Memory Access?

Why Unaligned Accesses Occur

Handling Unaligned Accesses

Unaligned Data Load/Store

Unaligned Single Load/Store

Hardware Decomposing

Unaligned Access Exceptions

Performance of Unaligned Accesses

Using Unaligned Accesses in Code

Effects on Code Density

Use of Unaligned Accesses by Compilers

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

ipsr register cortex-m4

Difference between arm7, arm9, arm11 and arm cortex

Application Binary Interface Examples

Arm Programming Software