Overview of Instruction Fetch in Cortex M
The IFU fetches instructions from memory and feeds them to the execution pipeline. It contains a prefetch buffer that caches instructions. The prefetch buffer helps reduce stalls when fetching from slower memories. The IFU also handles branching by fetching instructions from the new branch target address.
In Cortex M processors, the IFU fetches instructions in a linear fashion from consecutive memory addresses. Branch targets and exception entries use literal addresses rather than a branch predictor. This simplified approach reduces cost and power consumption.
Prefetch Buffer and Fetching
The prefetch buffer in Cortex M IFU caches instructions fetched from memory. It helps smooth out instruction execution when fetching from slower memories. The prefetch buffer size varies across Cortex M processors. Higher end M7 and M33 have 8-12 prefetch entries, while lower end M0+, M3, M4 have 2-3 entries.
At any time, the IFU tries to keep the prefetch buffer filled with instructions from sequential memory addresses. As the processor decodes instructions from the buffer, the IFU reads ahead and grabs the next set from memory. This helps avoid stalls while waiting for instructions.
The IFU fetches instructions in 32-bit aligned words from memory. It packages instructions from consecutive addresses into a single fetch. For compressed 16-bit Thumb instructions, 16-bit fetches are used. The fetched instructions are buffered and decoded before execution.
Branch Target Fetching
When a branch instruction is executed, the IFU must fetch instructions from the new target address rather than the next sequential address. Branch targets are handled by updating the address in the IFU’s instruction address register.
Unconditional branches simply set the new literal target address. Conditional branches set the address but it only takes effect if the condition passes. Otherwise it continues from the next instruction.
Branch instructions go through the normal decode and execute stages. The new target address is available by the time the branch instruction finishes. So the IFU can seamlessly switch to fetching from the target.
In Cortex M processors, branches use literal target addresses rather than branch prediction. While branch prediction can improve performance, it also increases cost and power usage. The literal target approach is simpler and more power efficient.
Interrupt and Exception Fetching
Interrupts and exceptions trigger a branch to handle the event. When an interrupt or exception occurs, the processor first saves its current state. Then the IFU fetches the exception handler instruction from a literal address in the vector table.
The vector table is located in a fixed region of memory and contains the entry points for each type of exception. The processor simply provides the exception number to index into the table and retrieve the target address.
After executing the exception handler, program flow returns to the original point using special instructions to restore state. The IFU handles this by resuming fetch from the return address rather than the next sequential address.
Reset Behavior
On reset, Cortex M processors begin fetching instructions from a fixed reset vector address. This address points to the initial startup code that configures the system. The processor resets the IFU to start fetching from this literal address.
During reset sequence, the IFU may only be able to perform a single fetch at a time until the bus and memory are initialized. Resetting the device triggers an exception-like behavior to start executing code at the reset vector.
Bus Interface Unit
While the IFU determines the sequence of instruction fetches, the Bus Interface Unit (BIU) handles the actual reading from memory. The BIU connects the core to the system bus and memory. It issues bus read requests to fetch instructions.
The BIU contains hardware for interfacing with the bus protocol like AHB or AXI. For each fetch, it requests a 32-bit aligned word from the memory system. The BIU also includes buffers to smooth transfers between the bus and the core.
If the bus is busy when the IFU requests a fetch, the BIU will insert wait states. This stalls the IFU until the data becomes available. The prefetch buffer helps minimize the performance impact of such stalls.
Cortex M3/M4/M7 – Microarchitecture
As an example, the Cortex M3, M4 and M7 implement dual instruction prefetch buffers of differing sizes. Additional logic allows loading ahead while separating instruction fetching and decoding.
The M3 and M4 IFUs contain:
- 2 x 16-byte prefetch buffers
- Incrementer to generate sequential fetch addresses
- MUX to select between branch target and incremented addresses
- Stall logic to handle bus wait states
The M7 IFU improves performance with:
- 2 x 64-bit prefetch buffers totaling 12 instructions
- Branch target buffer queue to reduce branch penalty
- Instruction stream signature analyzer for optimizations
In all cases, the IFU controls fetching the instruction stream while the BIU physically reads the data over the bus. The prefetches smooth out execution, while literal branch targets avoid complex prediction logic.
Summary
In summary, instruction fetching in Cortex M processors uses a streamlined approach focused on embedded applications:
- The IFU fetches instructions sequentially utilizing a prefetch buffer
- Branches use literal target addresses rather than prediction
- Exceptions and interrupts cause branches to fixed handler addresses
- The BIU interfaces the core to the system bus and memory
- A simplified microarchitecture reduces cost and power
Overall, the Cortex M instruction fetch architecture balances performance and efficiency for embedded systems.