The ARM Cortex-M0 is an ultra-low power 32-bit microcontroller core designed for embedded and IoT applications. It is part of ARM’s Cortex-M series of cores, which are optimized for low-cost and low-power embedded systems. The Cortex-M0 supports the ARMv6-M architecture, which includes some key features that enable efficient unaligned memory accesses.
What is Unaligned Memory Access?
Unaligned memory access refers to accessing data at a memory address that is not a multiple of the data size. For example, accessing a 32-bit integer at an address 0x10003 instead of a 4-byte aligned address like 0x10000. Unaligned accesses are generally inefficient on most processors since they have to do multiple reads and combine the parts. However, the Cortex-M0 includes hardware features to enable efficient unaligned accesses.
Unaligned Access Support in Cortex-M0
The Cortex-M0 core includes a Memory Protection Unit (MPU) and BusFault exception handling to support unaligned accesses safely and efficiently. Here are some key capabilities:
- The MPU allows unaligned access to be enabled or disabled on a per-region basis. This allows mixing aligned and unaligned accesses.
- Unaligned accesses crossing region boundaries will fault, preventing invalid crossings.
- The BusFault exception handler can fix up unaligned accesses in software if needed.
- Load and store multiple instructions support unaligned access by doing the correct number of accesses.
- The MPU, FPU, and debug components support unaligned accesses for their respective operations.
Overall the Cortex-M0 architecture is designed to handle unaligned accesses efficiently in hardware while maintaining protection and determinism.
Enabling Unaligned Access
There are a few steps needed to enable unaligned accesses on the Cortex-M0:
- Configure MPU regions to allow unaligned access by setting the UNALIGN_TRP bit to 0 in the region attribute register.
- Enable unaligned access globally using the UNALIGN_TRP bit in the Configuration and Control Register.
- Configure the BusFault handler to fix up any unaligned faults as needed.
- Use appropriate compiler options like
-munaligned-access
to generate unaligned accesses in code. - Make sure unaligned data is accessed within a single MPU region to avoid faults.
With this enabled, the Cortex-M0 core will efficiently handle unaligned loads, stores, and other memory accesses. The hardware fixes up the accesses automatically.
Unaligned Access Performance
When enabled properly, unaligned accesses on the Cortex-M0 have minimal performance impact compared to aligned accesses. Here are some considerations:
- Unaligned loads may take 1-3 extra cycles compared to aligned loads.
- Unaligned stores take the same number of cycles as aligned stores.
- Multiple bus faults can occur if unaligned accesses cross region boundaries.
- Code size may be larger when compiling with unaligned access enabled.
- Benchmarking should be done to quantify impact for a particular system.
In most cases, the performance overhead of unaligned accesses is minor compared to the flexibility benefit. Profile execution and tight loops may need optimization if using unaligned data.
Handling Unaligned BusFaults
If an unaligned access crosses an MPU region boundary, it will generate a BusFault exception. The handler for BusFaults should follow these steps:
- Read the BFAR register to get the unaligned fault address.
- Determine which region the address belongs to.
- Copy the unaligned data into an aligned buffer within the region.
- Modify the instruction to access the aligned buffer instead.
- Resume execution after the faulting instruction.
This fixes up the unaligned access in software transparently. More advanced handlers can chain together multiple unaligned accesses across regions.
Compiler Considerations
Here are some compiler settings to consider when using unaligned access on Cortex-M0:
- Use
-munaligned-access
to allow generating unaligned loads/stores. - May need
-mno-unaligned-access-size
to force word sized accesses. - Loop optimization flags like
-fno-move-loop-invariants
may help. - Inline assembly may be needed for fine grained control.
- Compile with alignment assertions enabled.
- Profile object code to identify impact of unaligned accesses.
Work closely with compiler documentation to utilize unaligned access appropriately. Hand written assembly can always align if needed.
Use Cases for Unaligned Access
Here are some common use cases where unaligned access on Cortex-M0 is beneficial:
- Reading packed data buffers – Unaligned loads efficiently read packed structs and data.
- Memory mapped peripherals – Device registers rarely aligned to natural boundaries.
- Interacting with non-C code – Assembly/RTOS code may use unaligned data.
- Reading external memory – External SPI/I2C chips have unaligned outputs.
- Sharing data with MCUs – Other MCUs like 8051s use unaligned data.
There are often significant optimizations in memory usage and code size from using unaligned access in these situations. The Cortex-M0 MPU handles it efficiently.
Limitations of Unaligned Access
There are a few limitations to keep in mind when using unaligned access on the Cortex-M0:
- Not all instructions support unaligned access, mostly loads, stores, and word accesses.
- Multi-word accesses like 64-bit doubles may trap.
- Unaligned bus faults have overhead, minimize crossing region boundaries.
- Unaligned stacks require careful management to avoid corruption.
- Device DMA engines typically require aligned buffers.
Unaligned data should be packed into aligned structures before usage in performance critical code. Profile code execution to identify any issues.
Conclusion
The Cortex-M0 provides robust support for unaligned memory access through its MPU, BusFault exception handling, and specialized instructions. This enables significant flexibility in reading and writing packed and non-aligned data with minimal performance overhead. Appropriate compiler settings and software handling of bus faults allows efficiently mixing aligned and unaligned data as needed. Overall, unaligned access support is a key benefit of the Cortex-M0 for embedded systems working with diverse data sources and formats.