The Cortex-M4 processor from ARM is a popular 32-bit processor used in many embedded systems. It has features like DSP instructions, floating point unit, memory protection unit, and low power consumption that make it suitable for a wide range of applications.
One key feature of Cortex-M4 regarding memory access is its ability to handle unaligned accesses efficiently. An unaligned access occurs when data is not accessed at its natural alignment. For example, accessing a 32-bit integer at an address not divisible by 4 bytes would be considered unaligned.
Why Unaligned Accesses Occur
There are several reasons why unaligned accesses can occur in Cortex-M4 based systems:
- Accessing packed data structures: Structures containing mixed data types like ints, shorts, and chars can have fields at unaligned addresses.
- Typecasting pointers: Casting a char pointer to int pointer can result in unaligned int access.
- Network data packets: Packet payload data is often unaligned relative to the processor’s natural alignment.
- IPC message buffers: Inter-processor communication buffers may place data at unaligned addresses.
- Reading device registers: Hardware registers don’t always follow processor’s alignment rules.
While aligned accesses are generally recommended for performance reasons, there are many cases where unaligned accesses are inevitable.
Problems with Unaligned Accesses
Performing unaligned memory accesses can cause the following problems on some processor architectures:
- Processor exceptions – Many processors will throw alignment faults on unaligned access resulting in exceptions.
- Performance overhead – Unaligned accesses may need to be emulated using multiple aligned accesses impacting performance.
- Atomicity issues – Unaligned accesses may no longer be atomic operations leading to concurrency problems.
- Endianness problems – Accessing unaligned multi-byte data can cause endianness related issues.
To avoid these problems, extensive software optimization is often required when handling unaligned accesses on such architectures. But all this comes at a significant performance cost.
Unaligned Access Handling on Cortex-M4
The Cortex-M4 core has dedicated hardware mechanisms to support unaligned accesses efficiently. Here are some key capabilities of Cortex-M4 with regards to unaligned accesses:
- All unaligned accesses are handled transparently in hardware.
- No processor exceptions or faults occur due to unaligned accesses.
- Unaligned accesses have same performance as aligned accesses.
- Atomicity of memory accesses is maintained irrespective of alignment.
- Hardware endianness conversion prevents any data issues.
This avoids all the software complexity associated with handling unaligned accesses on other architectures. The hardware takes care of aligning the unaligned access, reading data from memory, assembling aligned data, and providing correct aligned data to the core in a single cycle. This happens transparently without any changes needed in software.
Hardware Mechanism for Unaligned Access
Here is a simplified overview of how Cortex-M4 is able to handle unaligned accesses efficiently in hardware:
- The processor front-end performs instruction fetch and decode as 32-bit aligned accesses.
- Any unaligned load/store generated is split into two aligned accesses by the load/store unit.
- The aligned accesses are sent to the memory system e.g. bus interfaces, memory controllers.
- The data coming back fills a 64-bit buffer before going to the core registers.
- This buffer aligns the data and converts endianness if needed.
- The core gets the final aligned 32-bit data correctly in a single cycle.
The critical component here is the load/store aligner. This hardware block takes care of splitting the unaligned access, handling the aligned accesses, properly aligning data from memory before sending it to the core registers. This avoids any multi-cycle software emulation of unaligned accesses.
Enabling Unaligned Accesses in Software
To actually make use of the Cortex-M4 unaligned access capability in software, the following points need to be noted:
- The SCTLR.A bit must be set to 0 to enable unaligned accesses globally.
- Alignment checking on individual load/store instructions can still be enforced using the A-bit in instruction encoding.
- Any unaligned access between a pair of aligned accesses causes an exception.
- Unaligned LDM/STM is not supported, only single load/store can be unaligned.
Hence, the SCTLR.A bit must be cleared to 0 on processor start-up or during system initialization to allow unaligned accesses. The A-bit on individual instructions provides fine-grained control where needed. With this, software can freely perform unaligned accesses on Cortex-M4 without any penalty.
Benefits of Unaligned Access Support
The main benefits of having robust unaligned access support on Cortex-M4 are:
- Performance – No software overhead for emulating unaligned accesses.
- Atomicity – Single-cycle unaligned accesses remain atomic.
- Determinism – Unaligned access timings are deterministic like aligned accesses.
- Endianness – Hardware takes care of any data endianness issues.
- Ease of use – Software doesn’t need special handling for unaligned data.
This significantly simplifies software development and improves performance when dealing with unaligned data. Applications like networking, multimedia, cryptography, etc can benefit greatly from this feature.
Use Cases Enabled
The Cortex-M4 unaligned access feature enables several common use cases:
- Reading/writing data buffers, packet payloads used in communication systems.
- Typecasting pointers to access specific data types.
- Interfacing with hardware blocks and I/O devices using unaligned registers.
- Signal, image, and video processing algorithms handling unaligned data structures.
- Cryptography and compression algorithms using unaligned data buffers.
All these use cases can now work without the complexity of handling unaligned accesses in software. It simplifies development and improves performance.
Conclusion
The ability to handle unaligned accesses efficiently in hardware is a key capability of the Cortex-M4 processor. It enables significant software performance gains and simplification in various use cases dealing with unaligned data. By providing this feature in Cortex-M4, ARM has made the processor more capable and software development much easier for many embedded applications.