The Cortex-M series of ARM processors feature a simplified pipeline compared to larger Cortex-A series processors. The Cortex-M pipeline aims to provide high performance with low power consumption for embedded and IoT applications.
Fetch Stage
The fetch stage is the first stage of the Cortex-M pipeline. In this stage, the processor fetches instructions from memory. The program counter (PC) holds the address of the current instruction being fetched. The instruction is read from the memory location pointed to by the PC and stored in the instruction register (IR).
Cortex-M processors have a variable length RISC instruction set. Instructions can be 16-bit or 32-bit in size. The processor detects the size of the instruction fetched and increments the PC by 2 bytes or 4 bytes accordingly.
Branch prediction is not implemented in Cortex-M processors. Branches always incur a pipeline flush and fetch of the new instruction at the branch target address. This reduces complexity compared to processors like Cortex-A which implement dynamic branch prediction.
Decode Stage
In the decode stage, the instruction register (IR) contents are decoded to determine the instruction type and operand register specifiers. Decoding the register specifiers identifies which registers need to be read for the current instruction.
The Cortex-M pipeline only has a single decode stage, unlike some other pipelines which may have multiple decode stages. The relative simplicity of the RISC Thumb instruction set allows single-cycle decode.
During decode, register specifiers for the instruction are read from the register file. The pipeline registers values which will be used as operands for the instruction. The register file access occurs in parallel with decode.
Execute Stage
In the execute stage, the actual operation defined by the instruction is performed. This may include arithmetic or logical operations on register operands, memory load/store operations, or control flow operations like branches or function calls.
The functional unit that executes the operation depends on the instruction type. For example, integer arithmetic instructions use the ALU while memory access instructions use the data bus interface.
Results from the execution stage are written to the pipeline registers. Not all instructions produce a result – control flow instructions like branches simply redirect fetching to a new instruction stream.
Memory Access Stage
The memory access stage is where load and store instructions access data memory. Load instructions read data from memory into the pipeline registers. Store instructions write data from the pipeline registers back out to memory.
In Cortex-M pipelines, the memory access stage occurs in parallel with the execute stage. This enables single-cycle execution for load and store instructions.
The load/store instructions contain a calculated memory address prepared in the execute stage. This address is output on the address bus while the data bus transfers the stored value for loads or the stored value for stores.
Writeback Stage
The final stage of the Cortex-M pipeline is the writeback stage. Here, results from the execute stage are written to the register file so they update the register state.
Not all instructions perform writeback. Control instructions like branches do not write results to the registers. Load instructions write results to the registers in the memory access stage rather than writeback.
For arithmetic instructions like add or multiply, writeback updates the register file with the result value. This updates the architectural state of the processor, enabling dependent instructions to use this result going forward.
Pipeline Hazards
Like all pipelined processors, the Cortex-M pipeline must handle hazards where pipeline behavior does not follow the ideal sequential flow. These include:
- Data hazards occur when an instruction depends on the result of a preceding instruction which has not yet completed. The pipeline may stall execution until the required data is available.
- Control hazards occur when a branch or other change in control flow causes instructions already in the pipeline to be discarded. The pipeline must flush and fetch the new instruction stream from the branch target.
- Structural hazards occur when resource conflicts prevent required pipeline operations from occurring, for example two instructions needing a single ALU at the same time. Cortex-M pipelines are fully bypassed so structural hazards are avoided.
Cortex-M processors use pipeline techniques like register renaming, operand forwarding, and branch delay slots to minimize data hazards and control hazards. The simple RISC design of the Thumb ISA also limits pipeline stalls compared to more complex instruction sets.
Exceptions and Interrupts
The Cortex-M pipeline can handle exceptions and interrupts, which cause context switches to handler code. When an exception or interrupt occurs:
- The pipeline is flushed, discarding uncompleted instructions.
- Architectural state like the PC and PSR are saved to the stack.
- The PC is loaded with the handler address and execution begins at the handler.
When the handler finishes, state is restored and execution continues where it left off. The pipeline fetch and decode stages are redirected to the handler code while it executes before returning to the original program flow.
Pipeline Depth
The Cortex-M pipeline has a 3-stage pipeline. This shallower pipeline saves power and chip area compared to deeper pipelines in processors like Cortex-A. The shallower depth also reduces potential for data and control hazards.
Modern Cortex-M processors can achieve high performance despite the 3-stage pipeline due to advanced microarchitecture techniques. These include dual-issue superscalar execution, speculative fetch, and branch prediction. Advanced pipelines enhance throughput while maintaining the efficiency benefits of a short pipeline.
Conclusion
The Cortex-M processor pipeline uses a simplified 3-stage fetch-decode-execute design. Optimized specifically for embedded applications, it delivers high performance with low power consumption. Streamlined compared to application processor pipelines, it fits the efficiency needs of microcontroller and IoT workloads.
Techniques like pipelining, RISC instructions, and microarchitecture optimizations enable Cortex-M processors to achieve impressive performance per watt. The responsive, real-time capabilities contribute to the popularity of Cortex-M processors for embedded systems requiring efficient CPU execution.