The Cortex-M33 is an ARM microcontroller that is part of the Cortex-M processor family. It is designed for embedded and Internet of Things (IoT) applications requiring low power consumption and high performance. The Cortex-M33 has configurable memory options that allow designers to optimize the microcontroller for their specific application requirements.
On-Chip SRAM
The Cortex-M33 contains on-chip static RAM (SRAM) for data and instruction storage. The total SRAM capacity ranges from 32 KB to 1 MB, configurable in multiples of 32 KB. For example, common SRAM configurations are 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, and 1 MB. The exact SRAM size is chosen by the silicon vendor or design team implementing the Cortex-M33 in a system-on-chip (SoC).
The SRAM is directly connected to the processor core and serves as fast access memory for variables, stack data, and executable instructions. Larger SRAM capacities allow more data and code to be stored on-chip, reducing the need to access slower off-chip memory. However, larger SRAM consumes more silicon area and power.
Flash Memory
The Cortex-M33 also requires external flash memory to store firmware and application data. Flash provides non-volatile storage, meaning data is retained even when power is removed. The Cortex-M33 supports interfacing with external flash memory capacities ranging from 128 KB to 16 MB.
Common flash memory sizes used with the Cortex-M33 include 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, and 16 MB. The flash memory is connected to the processor using a bus-interface unit. Lower flash memory capacities result in a lower cost system, while higher capacities allow more firmware code and data storage.
Tightly Coupled Memory (TCM)
In addition to SRAM and flash, the Cortex-M33 optionally supports up to 64 KB of tightly coupled memory (TCM). TCM provides single-cycle access latency like SRAM, but is implemented using special low-latency RAM cells at the cost of higher silicon area. Up to 32 KB can be allocated to instruction TCM (ITCM) for storing time-critical program code, while up to 32 KB can be allocated to data TCM (DTCM) for low-latency data.
The fast access TCM can help improve performance for key functions that require determinism or consistent response times. TCM is particularly useful for real-time control, digital signal processing, and other applications where instruction or data access latency impacts performance. However, the area cost makes TCM unsuitable for bulk data and code storage.
External RAM
For systems that require more memory than available on-chip, the Cortex-M33 supports interfacing to external RAM such as DRAM and SRAM. Common RAM interface configurations are 8-bit, 16-bit, and 32-bit wide, supporting up to 4 GB of memory. The external RAM can be used to store large data sets, buffers, and other information not performance critical.
The Cortex-M33 memory architecture is highly flexible, allowing silicon partners and design teams to customize memory configurations tailored to their application needs. Optimizing the mix of on-chip and external memories allows balancing performance, silicon area, and system cost requirements.
Memory Protection Unit
The Cortex-M33 includes an optional memory protection unit (MPU) for enhancing software reliability and security. The MPU divides the memory map into multiple regions and assigns access permission attributes to each region. This prevents bugs or malicious actions from corrupting protected memory areas.
For example, firmware code can be marked execute-only to prevent data writes. Similarly, OS kernel resources can be made inaccessible to user application code. The MPU configuration is highly customizable based on the needs of the system designer.
Memory Access Methods
The Cortex-M33 core accesses memory using both load/store and instruction fetch operations. Load/store accesses transfer data between the core registers and memory to enable algorithms and data processing. Instruction fetches load executable opcodes into the instruction pipeline to drive program execution.
The processor contains dedicated instruction and data interfaces along with address generation units to enable simultaneous data and instruction accesses. Up to three concurrent memory accesses can be performed each cycle – two data accesses and one instruction fetch. The Cortex-M33 implements the ARMv8-M Thumb instruction set with most instructions executing in a single cycle.
Memory Mapping
The Cortex-M33 utilizes a von Neumann architecture with a unified address space for both code and data. Addresses range from 0x00000000 to 0xFFFFFFFF, providing up to 4 GB of memory. Different types of memories are mapped into blocks within this address space.
The bottom region is reserved for core peripherals such as timers and the nested vectored interrupt controller (NVIC). SRAM, TCM, and external memories are mapped to consecutive address ranges. Memory-mapped I/O regions provide access to external peripherals. Unused address regions can abort for safety critical systems.
Out-of-Order and Speculative Execution
To maximize performance, many processors utilize out-of-order and speculative execution techniques. However, the Cortex-M33 uses an in-order pipeline and does not implement speculative execution. While this reduces peak performance, it also avoids side-channels that leak sensitive data through execution timing.
The lack of speculative behavior provides security and determinism advantages important for embedded systems. Memory accesses occur in program order without the possibility of speculation-based vulnerabilities. The deterministic pipeline simplifies reasoning about instruction timing making the Cortex-M33 well-suited to real-time applications.
Memory Consistency Model
As a 32-bit microcontroller with a single core, the Cortex-M33 has a simple memory consistency model. All instructions execute in program order and complete before the next instruction begins. This avoids issues like out-of-order and speculative execution where program order differs from execution order.
Load/store memory accesses are also implicitly ordered and guaranteed to complete in program order. The ARMv8-M architecture ensures sequential consistency, meaning memory ordering obeys a single global sequence. Therefore, the Cortex-M33 has no reordering or memory synchronization requirements for shared memory.
Caches
To maximize performance, most modern processors contain caches to exploit locality and reduce the average latency of memory accesses. However, the Cortex-M33 does not implement caches for data or instructions. Instead, it relies on fast single-cycle SRAM and TCM access to supply operands and instructions.
The lack of caches simplifies software development as there are no cache coherence or invalidation operations to consider. It also avoids unpredictable cache-related stalls. While caches improve average memory latency, response time becomes less deterministic. The cache-less Cortex-M33 design is advantageous for real-time control and event-driven processing.
Wait States
When the Cortex-M33 core performs a memory access, it asserts address and control signals indicating a valid request. External memories such as SRAM, DRAM, and flash require a fixed number of clock cycles before the data is available. Wait states are used to stall the processor for the required access time.
The number of wait states is configurable based on the access time of the integrated memory technology. Faster memories require fewer wait states, reducing the impact on memory access performance. Typical wait state values range from 0 cycles for on-chip SRAM up to several cycles for slower external flash or DRAM memories.
Memory Bandwidth
The Cortex-M33 memory architecture provides high bandwidth to satisfy data movement requirements. At its maximum clock frequency of around 100 MHz, the Cortex-M33 can perform up to 200 million data memory accesses per second.
For a 32-bit memory interface, this equates to 3.2 GB/s of theoretical peak bandwidth between the core and external memory subsystem. In practice, actual application bandwidth is reduced by access patterns, wait states, and time executing non-memory instructions. Real-world bandwidth depends heavily on the target application.
Code Density
Code density measures how compactly the processor can encode program instructions, an important metric for cost-sensitive embedded systems. The Cortex-M33 implements the Thumb-2 instruction set which improved code density up to 40% over previous ARM architectures.
For 16-bit Thumb instructions, the Cortex-M33 achieves 1.08 bytes per instruction. With mixed 16-bit and 32-bit encodings, average code density ranges from 1.1 – 1.3 bytes per instruction based on the compiler and application. This allows more compact firmware to reduce flash memory requirements and cost.
Memory Access Performance
The Cortex-M33 delivers high performance for accessing data and instructions from memory. For data loads and stores, typical memory interfaces provide single-cycle access for on-chip SRAM and TCM. Access times for external memories depend on wait states configured for the target memory speed.
Instruction fetches achieve a peak throughput of one 32-bit ARM Thumb instruction per cycle thanks to an efficient dual prefetch buffer. Actual attainable instruction execution rates are ultimately limited by data access and processing times for real applications.
Memory-Mapped Registers
In addition to SRAM and peripheral memory regions, the Cortex-M33 memory map includes memory-mapped registers used to control core and system functions. These include registers like the auxiliary control register for enabling interrupts, system handler priorities, and the system timer.
Memory-mapped registers allow modifying configuration settings and peripherals with simple load/store instructions. This reduces hardware complexity by using the unified address map for both memory and registers. Peripherals appear like additional memory to software but are implemented separately from main data memories.
TrustZone Memory Partitions
The Cortex-M33 includes TrustZone security extensions for partitioning into secure and non-secure states. This enables isolated secure world execution to protect sensitive code, data, and system resources from vulnerabilities in non-secure applications.
To prevent non-secure access, certain memories can be designated secure-only with other memories shared between secure and non-secure worlds. Virtual addressing isolates the two worlds into different address spaces. The TrustZone memory protections occur at a hardware level to resist software attacks.
Memory Errors
Like all hardware, memory subsystems can experience faults and errors during operation. On-chip SRAM is susceptible to particle strikes or fabrication defects. External DRAM and flash are vulnerable to charge leakage and write/erase degradation over time.
To detect memory errors, the Cortex-M33 supports integrating error correction codes (ECC) into on-chip and external memories. Parity or more advanced SECDED ECC can be implemented transparently to software. The fault reporting and memory remediation mechanisms help ensure reliable system operation.
Conclusion
In summary, the Cortex-M33 microcontroller contains flexible memory resources optimized for embedded applications. On-chip SRAM and TCM deliver low-latency data and instructions storage configurable from 96 KB to 1 MB. External flash memory from 128 KB to 16 MB holds firmware and bulk data. Fast single-cycle core access enables real-time determinism and control reliability needed for intelligent edge devices and IoT endpoints.