SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: Cortex M4 Unaligned Access
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

Cortex M4 Unaligned Access

Javier Massey
Last updated: October 5, 2023 9:56 am
Javier Massey 7 Min Read
Share
SHARE

The Cortex-M4 processor from ARM is a popular 32-bit processor used in many embedded systems. It has features like DSP instructions, floating point unit, memory protection unit, and low power consumption that make it suitable for a wide range of applications.

Contents
Why Unaligned Accesses OccurProblems with Unaligned AccessesUnaligned Access Handling on Cortex-M4Hardware Mechanism for Unaligned AccessEnabling Unaligned Accesses in SoftwareBenefits of Unaligned Access SupportUse Cases EnabledConclusion

One key feature of Cortex-M4 regarding memory access is its ability to handle unaligned accesses efficiently. An unaligned access occurs when data is not accessed at its natural alignment. For example, accessing a 32-bit integer at an address not divisible by 4 bytes would be considered unaligned.

Why Unaligned Accesses Occur

There are several reasons why unaligned accesses can occur in Cortex-M4 based systems:

  • Accessing packed data structures: Structures containing mixed data types like ints, shorts, and chars can have fields at unaligned addresses.
  • Typecasting pointers: Casting a char pointer to int pointer can result in unaligned int access.
  • Network data packets: Packet payload data is often unaligned relative to the processor’s natural alignment.
  • IPC message buffers: Inter-processor communication buffers may place data at unaligned addresses.
  • Reading device registers: Hardware registers don’t always follow processor’s alignment rules.

While aligned accesses are generally recommended for performance reasons, there are many cases where unaligned accesses are inevitable.

Problems with Unaligned Accesses

Performing unaligned memory accesses can cause the following problems on some processor architectures:

  • Processor exceptions – Many processors will throw alignment faults on unaligned access resulting in exceptions.
  • Performance overhead – Unaligned accesses may need to be emulated using multiple aligned accesses impacting performance.
  • Atomicity issues – Unaligned accesses may no longer be atomic operations leading to concurrency problems.
  • Endianness problems – Accessing unaligned multi-byte data can cause endianness related issues.

To avoid these problems, extensive software optimization is often required when handling unaligned accesses on such architectures. But all this comes at a significant performance cost.

Unaligned Access Handling on Cortex-M4

The Cortex-M4 core has dedicated hardware mechanisms to support unaligned accesses efficiently. Here are some key capabilities of Cortex-M4 with regards to unaligned accesses:

  • All unaligned accesses are handled transparently in hardware.
  • No processor exceptions or faults occur due to unaligned accesses.
  • Unaligned accesses have same performance as aligned accesses.
  • Atomicity of memory accesses is maintained irrespective of alignment.
  • Hardware endianness conversion prevents any data issues.

This avoids all the software complexity associated with handling unaligned accesses on other architectures. The hardware takes care of aligning the unaligned access, reading data from memory, assembling aligned data, and providing correct aligned data to the core in a single cycle. This happens transparently without any changes needed in software.

Hardware Mechanism for Unaligned Access

Here is a simplified overview of how Cortex-M4 is able to handle unaligned accesses efficiently in hardware:

  • The processor front-end performs instruction fetch and decode as 32-bit aligned accesses.
  • Any unaligned load/store generated is split into two aligned accesses by the load/store unit.
  • The aligned accesses are sent to the memory system e.g. bus interfaces, memory controllers.
  • The data coming back fills a 64-bit buffer before going to the core registers.
  • This buffer aligns the data and converts endianness if needed.
  • The core gets the final aligned 32-bit data correctly in a single cycle.

The critical component here is the load/store aligner. This hardware block takes care of splitting the unaligned access, handling the aligned accesses, properly aligning data from memory before sending it to the core registers. This avoids any multi-cycle software emulation of unaligned accesses.

Enabling Unaligned Accesses in Software

To actually make use of the Cortex-M4 unaligned access capability in software, the following points need to be noted:

  • The SCTLR.A bit must be set to 0 to enable unaligned accesses globally.
  • Alignment checking on individual load/store instructions can still be enforced using the A-bit in instruction encoding.
  • Any unaligned access between a pair of aligned accesses causes an exception.
  • Unaligned LDM/STM is not supported, only single load/store can be unaligned.

Hence, the SCTLR.A bit must be cleared to 0 on processor start-up or during system initialization to allow unaligned accesses. The A-bit on individual instructions provides fine-grained control where needed. With this, software can freely perform unaligned accesses on Cortex-M4 without any penalty.

Benefits of Unaligned Access Support

The main benefits of having robust unaligned access support on Cortex-M4 are:

  • Performance – No software overhead for emulating unaligned accesses.
  • Atomicity – Single-cycle unaligned accesses remain atomic.
  • Determinism – Unaligned access timings are deterministic like aligned accesses.
  • Endianness – Hardware takes care of any data endianness issues.
  • Ease of use – Software doesn’t need special handling for unaligned data.

This significantly simplifies software development and improves performance when dealing with unaligned data. Applications like networking, multimedia, cryptography, etc can benefit greatly from this feature.

Use Cases Enabled

The Cortex-M4 unaligned access feature enables several common use cases:

  • Reading/writing data buffers, packet payloads used in communication systems.
  • Typecasting pointers to access specific data types.
  • Interfacing with hardware blocks and I/O devices using unaligned registers.
  • Signal, image, and video processing algorithms handling unaligned data structures.
  • Cryptography and compression algorithms using unaligned data buffers.

All these use cases can now work without the complexity of handling unaligned accesses in software. It simplifies development and improves performance.

Conclusion

The ability to handle unaligned accesses efficiently in hardware is a key capability of the Cortex-M4 processor. It enables significant software performance gains and simplification in various use cases dealing with unaligned data. By providing this feature in Cortex-M4, ARM has made the processor more capable and software development much easier for many embedded applications.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article armv7 unaligned access
Next Article armv8 unaligned access
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Cortex M0 Pipeline Stages

The Cortex-M0 is a 32-bit ARM processor optimized for low-power…

7 Min Read

Cortex-M4 DSP Instructions

The Cortex-M4 processor from ARM includes a range of digital…

6 Min Read

Importance of Adequate Stack Size for Threads in Keil RTX

When developing multithreaded applications using Keil RTX, one important consideration…

7 Min Read

Differences between osDelay() and osWait() calls in Keil RTX

The osDelay() and osWait() functions are used for creating delays…

5 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account