When looking at memory in ARM-based systems, there are two main types of memory to consider: normal memory and device memory. Normal memory refers to the standard system RAM that is used by the operating system and applications. Device memory refers to memory that is integrated into peripheral devices and used for buffering data transfers. Understanding the differences between these two types of memory is important for effective software development on ARM platforms.
Normal Memory
Normal memory, also referred to as system memory, is the general purpose RAM that the operating system manages and doles out to applications. This memory resides in the physical address space of the ARM core processors. Normal memory provides relatively fast access speeds and large storage capacity at a reasonably low cost per bit. There are no special hardware requirements to interface with normal memory. The ARM memory management unit (MMU) and cache subsystem provide the interfaces to normal memory. The operating system handles all the messy details of memory allocation, virtual to physical address translation, cache coherency, etc. When an application requests memory, it is simply given a virtual address range to access, and the complexity of how that memory maps to the underlying physical memory system is hidden.
From a hardware perspective, normal memory is typically implemented as DRAM modules attached to the system bus. The system memory controller handles all the DRAM protocol requirements like refresh cycles. ARM system-on-chip (SoC) designs integrate one or more memory controllers to interface with external DRAM chips. High-end application processors support attaching multiple DRAM modules in parallel to increase overall memory bandwidth. The memory controller and DRAM configuration details are specific to each ARM platform implementation. Software doesn’t need to worry about these details thanks to the abstraction provided by the MMU.
Normal memory has high capacity but still takes up valuable board space and consumes significant system power. Embedded systems may optimize these attributes by implementing normal memory types like mobile DRAM and wide I/O memory. But from a software standpoint, it still appears as standard RAM accessible over the system bus at virtual addresses managed by the OS.
Device Memory
Device memory refers to small blocks of RAM integrated into the peripherals on an ARM system. This memory serves as buffer space for the peripheral hardware to autonomously transfer data without excessive real-time demands on the ARM cores. Device memory allows peripherals to offload the ARM processors from transferring every single byte of data. Instead, the ARM cores can simply configure the transfer parameters, kick off the transfer, then check back later after the peripheral and its local memory have handled the bulk of the work.
For example, consider an ethernet controller. Packets arrive over the physical network link continuously. If the ARM cores had to process each byte in real time, it would be extremely inefficient in terms of power and responsiveness. Instead, the ethernet controller hardware can transfer packets to its local buffers as they come in. The ARM cores can then periodically check in on new packets after the bulk of the transfer is complete. This improves overall system performance and efficiency.
From a physical hardware perspective, device memory is integrated directly into the peripheral silicon. It doesn’t sit on the system bus or share physical address space with normal memory. The only way to access device memory is through registers that allow the ARM cores to configure and initiate transfers to/from the peripheral buffers. The peripherals handle all the nitty gritty details of the memory like bus arbitration and refresh cycles.
Another example is a DMA controller. DMA provides a method to transfer data between normal system memory and device memory without continuous ARM core intervention. The ARM cores set up the transfer by telling the DMA controller where the source and destination buffers are located and how much data to move. The DMA controller then takes over to generate the bus transactions necessary to complete the transfer. This allows the ARM cores to proceed with other tasks while the data transfer occurs in the background.
Distinctions and Use Cases
The key distinction between normal and device memory is where the memory physically resides and how it connects to the rest of the system. Normal memory is external DRAM whereas device memory is integrated into the on-chip peripheral hardware. From a software perspective, the main differences revolve around accessibility.
Software interacts with normal memory through typical load/store instructions. Virtual addresses make it simple and uniform to access normal memory. In contrast, device memory interaction relies on peripheral configuration registers to kick off transfers. The peripherals autonomously handle the intricate details of the device memory without direct software management.
Normal memory works well for general purpose code and data storage. It provides a simple linear array of bytes or words accessible by virtual address. Device memory complements system memory by providing dedicated buffers closer to the peripherals. This is advantageous for streaming data applications like networking, multimedia, sensors, etc.
Here are some examples of how the two memory types are used in ARM-based systems:
- Application code is stored in normal memory for easy execution by the ARM cores.
- Bulky media assets like textures, audio clips, and videos reside in normal memory for apps to access.
- The ARM cores access OS kernel data structures and stacks within normal memory.
- Device drivers and HALs may use DMA to move data between normal and device memory.
- USB data payloads get buffered in USB controller device memory during transfer.
- Bluetooth chips buffer incoming and outgoing packets in device memory.
- Image sensor data gets stored in device memory on the camera interface IC.
- GPU textures and rendered frames utilize local device memory on the graphics processor.
Memory Mapping
The ARM memory management unit (MMU) only maps virtual addresses to physical addresses within the normal system memory space. Physical addresses belonging to device memory are not managed by the MMU. So how do software and processors interact with device memory regions if they are not directly mapped?
The answer depends on whether the ARM cores need to directly access the device memory or if autonomous peripheral DMA transfers are sufficient. For DMA transfers, software simply triggers the transfer and peripherals handle all direct device memory access. This keeps device memory access encapsulated in the peripheral hardware.
In cases where the ARM cores need direct access to device memory, the peripheral must provide a bank of registers at a physical address range. Writes to these registers translate to writes into the corresponding device memory locations managed within the peripheral hardware. The processor is really writing registers mapped into the normal system memory space. The peripheral handles translating its internal register accesses into device memory access under the hood.
For example, a graphics processor may map its local framebuffer into a register bank. When the ARM cores write to these registers, the graphics chip translates this to writing its internal device memory. This allows software to update graphics textures and framebuffers without being aware of the underlying device memory implementation.
In some cases, device memory may be mapped into the normal system address space. This technique, called memory-mapped I/O, bridges the device and normal memory domains by assigning reserved physical address ranges to peripheral device memory. This gives a transparent method to access registers and buffers on peripheral hardware through standard memory read/write commands. The memory controller routes transactions to mapped peripheral registers over the system bus.
Cache Coherency
ARM processors utilize caches to reduce latency and increase throughput for memory accesses. However, the ARM caches are only coherent with normal system memory. They do not maintain coherence for memory domains outside of the cacheable system address space, namely device memory.
Not having caches extend to device memory generally does not cause issues. The peripherals locally handle device memory access. If the ARM cores directly interact with device memory through register mapping, they treat it as uncached I/O space. Explicit cache maintenance operations are used around the register accesses to avoid coherency issues.
However, some high throughput peripherals like GPUs or fast networking controllers allow cacheable mappings of their device memory into the normal address range. This enables the ARM caches to serve local copies of the data for lower access latency and higher overall performance. Cache coherency extensions like ACE-Lite are part of the solution to keep the ARM cores and peripherals in sync.
Generally, you don’t need to worry about cache coherency with device memory. Autonomous peripheral DMA provides the bulk of the transfers. Other accesses use uncached methods. In high-end systems where device memory gets mapped cacheable, specialized hardware mechanisms handle coherency under the hood.
TrustZone and Security
ARM TrustZone provides system-wide security partitioning between “secure” and “non-secure” states. This technology has implications regarding access permissions for both normal and device memory regions. Essentially, TrustZone aims to restrict non-secure software from accessing resources designated as secure.
Normal memory can be designated as secure or non-secure. The ARM cores enforce these permissions. Non-secure software cannot access memory assigned to the secure world regardless of privilege level. The TrustZone Memory Adapter (TZMA) implements this access control for external DDR memory partitions.
Device memory permissions are managed by the peripheral hardware. For example, a secure cryptographic module would likely prevent any non-secure software from accessing its internal buffers. However, non-secure peripherals like a USB controller would freely allow non-secure access from device drivers.
In general, TrustZone memory security is strongest when its protections extend across both normal and device memory domains. Keys or sensitive data buffered in device memory on a secure peripheral should only be accessible by trusted secure code running on the ARM cores. With coordinated design between ARM cores, memory controllers, peripherals, and TZMA hardware, robust memory protections can be achieved.
Virtualization
ARM processors support virtualization to enable running multiple isolated operating systems on a single SoC. The hypervisor manages interactions between the guest operating systems and underlying physical system resources.
Normal memory is virtualized by trapping guest OS page table updates and address translations. The hypervisor transparently maps the OS-visible guest virtual addresses onto the physical memory pages allocated to that VM. Device memory is not visible through this layer of indirection.
Instead, each guest OS interacts with virtual models of the platform peripherals implemented in software by the hypervisor. Accesses to virtual device configuration registers are trapped and emulated by the hypervisor code. Memory transfers like DMA get remapped between guest OS buffers and physical device memory by the hypervisor.
In essence, the hypervisor completely hides device memory from the individual guests. It multiplexes the physical resources like device buffers between virtual machines as needed to handle peripheral data transfers. The guest OSes only interface with virtual device models in normal memory with no direct access to real device memory.
Conclusion
Normal and device memory serve different but complementary purposes in ARM systems. Normal memory provides a generic pool of fast RAM for code and data. Device memory integrates tailored buffers into peripherals for efficient data transfers. Effective software on ARM must utilize both types appropriately for optimal performance.
Normal memory leverages the ARM memory management unit for simple virtual address access and abstraction from physical resources. Device memory integrates into the peripheral silicon itself with carefully designed interfaces back to the processors and system bus as needed. Hardware mechanisms like DMA controllers bridge between the memory domains.
ARM TrustZone and virtualization also factor into the system memory architecture. Access permissions and mapping techniques must be coordinated between ARM cores, memory controllers, peripherals, and hypervisor code. With thoughtful system design, ARM’s flexible approach to memory enables both security and virtualization without compromising performance for modern embedded applications.