Unified vs Separate Memory Address Spaces in ARM Cortex-M

ARM Cortex-M processors can be configured with either a unified or separate memory address space for code and data. The choice impacts performance, software design, and memory protection capabilities.

Contents

Unified Memory Address Space Performance Impact Separate Memory Address Spaces Performance Impact Hardware Support TCM Regions Instruction Support Software Considerations Dynamic Memory Allocation Memory Protection Performance Optimization Use Case Guidelines Conclusion

Unified Memory Address Space

In the unified memory address space configuration, both code and data share the same logical address space. All memory access instructions like LDR, STR can access any memory location without distinction between code or data. This simplifies software development as pointers can freely refer to both code and data addresses.

Unified address space allows full interaction between code and data. Data pointers can point directly into code space and vice versa. This enables constructs like self-modifying code, computed branches, and position independent code. However, it also means that bugs in data accesses can corrupt program code.

Sharing the address space maximizes flexibility in locating code and data within the physical memory map. The linker can freely place code and data within the full 4GB address range. Dynamic memory allocators like malloc can return data pointers anywhere in memory.

The downside is reduced memory protection abilities. With unified space, simple hardware mechanisms like MPUs cannot distinguish between accesses to code or data regions. More advanced MMUs that support sub-dividing the address space are required for robust memory protection.

Performance Impact

Unified address space allows identical load and store instructions to be used for both code and data. This avoids any performance penalty when accessing different address spaces.

However, competition between instruction and data accesses can create contention on the memory bus, especially with heavy dynamic memory allocation. Caches are less effective since code and data contend for the same cache space.

Separate Memory Address Spaces

With separate address spaces, code and data memory are accessed via distinct address ranges. Loads and stores to each space require using different instructions.

For example, LDR instructions can only access data memory. Special LDM instructions are needed to load code values. Similarly, stores use STR for data memory and STM to modify code space.

This strict separation prevents accidental writes from data accesses corrupting program code. Bugs can still crash a program, but they cannot inadvertently modify code.

The hardware MPU can also enforce separation between code and data. For example, configuring a data region as read-only will prevent erroneous data writes from modifying code space.

Code and data memory must be defined in distinct memory regions. Addresses used by the code are invalid for general data access. This can complicate modifying code locations at runtime.

Performance Impact

The separate address space incurs some performance overhead. Special instructions are required for code accesses, and additional address decoding is needed to distinguish instructions for code vs data space.

If the processor needs to fetch an instruction, it cannot simultaneously perform a data access. This prevents dual access to both spaces in one clock cycle.

However, segregating code and data accesses also has potential performance benefits. It eliminates bus contention between instruction and data fetches. Separate caching for code vs data also improves hit rates.

Hardware Support

Cortex-M3 and M4 processors support only a unified address map configuration. All memory accesses use the same instructions.

Starting with the Cortex-M7, processors can be configured to support either unified or separate address spaces through the TCM bits in the Auxiliary Control Register. This register is configured by the processor boot code.

When enabled, the TCM bits partition the memory map into distinct regions for code and data access. Attempting to use a code instruction on data memory will fault, and vice versa.

TCM Regions

The TCM regions provide a quick way to separate memory while minimizing changes to toolchain and code. TCM stands for “Tightly Coupled Memory” – on-chip RAM that can be accessed with low latency.

Region 0 spans addresses 0x00000000 to 0x1FFFFFFF and is dedicated to code memory. Region 1 from 0x20000000 to 0x3FFFFFFF is reserved for data access.

The rest of the 4GB address space remains as unified memory, accessible by either type of instruction. So code and data can still coexist in the same physical memory blocks, outside the TCM regions.

Instruction Support

With separated spaces, six new instructions access code memory: LDM, LDMFD, LDMIA, LDMDB, STM, and STMIA. These perform loads and stores specifically for the code region, analogously to LDR/STR for data memory.

Attempting to use LDR or STR on code space will fault. Similarly, using the LDM instructions on data memory is disallowed.

Software Considerations

Separated address spaces requires compiler support to correctly use the appropriate instruction for each memory access. Hand written assembly code needs to carefully manage the distinct instructions.

Accessing peripheral registers can be problematic. They exist in the data address space, but often contain code for handlers. Special care is needed to access these code pointers.

PC-relative addressing modes are recommended for code, avoiding hardcoded addresses. Position independent code facilitates adjusting to the code region location.

For data accesses, absolute addresses or base register relative modes Work well. Data addresses can be loaded into registers by the startup code.

Computed branches across code and data memory may require toggling back to unified mode. Alternatively, small trampoline stubs can bridge between the two spaces.

Dynamic Memory Allocation

Since malloc and new can only return data addresses, dynamically allocated code poses challenges. One solution is to allocate a unified memory block for generated code.

For larger dynamic allocations, the MPU regions can be reconfigured on the fly. Temporarily marking a region as unified access allows standard allocators to be used.

Memory Protection

Separated spaces allow simple MPU configurations to enforce memory protections. Code, data, stacks, and peripherals can each get dedicated MPU regions with strict access rules.

For example, the code region can be read-only for data accesses. The data and stacks regions can be no-execute for code fetches. Tightly restricting valid access types enhances security.

Unified spaces provide minimal protection. Even an MPU has limited ability to prevent errant data writes from modifying code unexpectedly.

Performance Optimization

For primarily sequential code without heavy dynamic allocation, separating spaces may improve performance.

Dedicated caching for code vs data prevents thrashing and improves hit rates. Independent code and data buses also eliminate contention during simultaneous fetches.

For systems with extensive dynamic allocation or self-modifying code, unified address spaces perform better. The flexibility and simplicity outweighs the costs of potential contention.

Use Case Guidelines

Here are some general guidelines on when to use each mode:

Use unified spaces for self-modifying code, JIT compilers, computed branches, and position independent code.
Use unified if dynamic memory allocation is pervasive and code/data locations are highly intermixed.

Prefer separate spaces for “safer” execution – catching errant data writes. Useful for high reliability code.
Use separate if performance profiling shows high cache miss rates or bus contention in unified mode.
Separate spaces are easier to memory protect with an MPU. Great for security-critical applications.

Conclusion

ARM Cortex-M processors can support either unified or separate address spaces for code vs data memory. Separate spaces improve memory protection while unified provides greater flexibility. The optimal choice depends on the performance profile, code patterns, and memory requirements of the application. Architectural awareness allows selecting the configuration that maximizes efficiency and reliability.

Unified vs Separate Memory Address Spaces in ARM Cortex-M

Unified Memory Address Space

Performance Impact

Separate Memory Address Spaces

Performance Impact

Hardware Support

TCM Regions

Instruction Support

Software Considerations

Dynamic Memory Allocation

Memory Protection

Performance Optimization

Use Case Guidelines

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

What is Serial Wire Viewer (SWV) in Arm Cortex-M?

Flash Patch and Breakpoint Unit (FPB) in Arm Cortex-M Explained

Arm Cortex-M DAP bus and interconnect architecture Explained

Controlling Clocks and PLL for Power Savings in Cortex-M3