SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is a 3 stage pipeline in Arm cortex-m?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is a 3 stage pipeline in Arm cortex-m?

Eileen David
Last updated: October 5, 2023 9:40 am
Eileen David 7 Min Read
Share
SHARE

The Arm Cortex-M processors utilize a 3-stage instruction pipeline to achieve higher performance compared to simpler single cycle execution. The three stages of the pipeline are Fetch, Decode, and Execute. This allows the processor to work on different steps of multiple instructions simultaneously, increasing instruction throughput.

Contents
What is Pipelining?The 3 Stage Pipeline in Cortex-MFetchDecodeExecutePipelining Concepts1. Pipeline Depth2. Ideal Pipeline Speedup3. Pipeline Hazards4. Superscalar and Out-of-Order ExecutionPipelining in Cortex-M31. Simple Increment Instruction2. Data Dependency Hazards3. Branch PredictionConclusion

What is Pipelining?

Pipelining is a technique used in modern processors to increase instruction execution performance. Without pipelining, processors would execute instructions sequentially one after another. This is called single cycle execution. Each instruction goes through the same steps: Fetch, Decode, Execute, and Write Back. Only after an instruction completes can the next instruction begin.

Pipelining improves performance by allowing a new instruction to begin execution before the previous one has finished. The processor is divided into stages, each performing one step of instruction execution. Instructions move through the stages like water through a pipe. At any given time, many instructions may be at different stages of completion.

For example, while one instruction is being executed, the next can be decoded and a third fetched from memory. This overlaps steps of sequential instructions to maximize performance. Pipelining increases instruction throughput – the number of instructions completed per cycle.

The 3 Stage Pipeline in Cortex-M

The Cortex-M processors use a 3-stage pipeline consisting of Fetch, Decode, and Execute stages. Let’s examine each stage closer:

Fetch

In the Fetch stage, the processor fetches the next instruction to execute from memory. This includes:

  • Calculating the address of the next instruction based on the program counter.
  • Reading the instruction from memory or cache.
  • Updating the program counter to the next instruction.

At the end of the Fetch, the processor has the binary machine code for the next instruction to execute.

Decode

In the Decode stage, the processor interprets and decodes the fetched instruction. This includes:

  • Identifying the type of instruction (add, load, branch etc).
  • Extracting operand fields from the instruction.
  • Reading registers or intermediate values if operands are registers.
  • Determining the execution unit to use for instruction.

By the end of the Decode, the processor knows exactly what needs to be done to execute this instruction.

Execute

In the Execute stage, the actual operation of the instruction is performed. This may include:

  • Performing an arithmetic operation on registers/data.
  • Calculating a memory address for load/store.
  • Accessing data memory for loads/stores.
  • Updating status flags based on results.

At the end of Execute, the functional operation of the instruction is complete.

Pipelining Concepts

Some key concepts related to pipelining help illustrate how it improves performance:

1. Pipeline Depth

The number of stages in the pipeline is called its depth. Cortex-M uses a 3-stage pipeline. More stages allow more instructions to be worked on at once, but add complexity. Modern processors like Intel x86 have over 20 stages!

2. Ideal Pipeline Speedup

The ideal speedup from pipelining is equal to the number of stages. A 3-stage pipeline has a maximum speedup of 3x. This means in ideal conditions, it can complete up to 3 instructions every cycle compared to just 1 in a non-pipelined implementation.

3. Pipeline Hazards

Pipeline hazards occur when the ideal scenario is disrupted. Three main hazards are:

  • Structural: Issue arises from hardware limitations.
  • Data: Instruction depends on data not ready yet.
  • Control: Branch prediction affects instruction flow.

Proper hazard handling is needed to minimize performance impact.

4. Superscalar and Out-of-Order Execution

Even with pipelining, processors are often inefficient due to stalls and empty pipeline slots. Superscalar processors can initiate multiple pipelines simultaneously to increase instruction parallelism. Out-of-order executiondynamically re-arranges instruction order to avoid stalls.

Pipelining in Cortex-M3

Let’s look at a specific example of how the 3-stage pipeline works in the Cortex-M3 processor. The M3 implements the ARMv7-M architecture.

1. Simple Increment Instruction

Suppose we want to increment a register value using the instruction ADD R1, R1, #1. Here are the steps the M3 would take:

  • Fetch: Fetch ADD instruction from memory into pipeline.
  • Decode: Determine ADD needs to increment R1 register by 1.
  • Execute: Increment R1 value and update status flags.

The increment takes 3 cycles to complete due to the 3-stage pipeline. But other instructions can enter right after it keeping the pipeline full.

2. Data Dependency Hazards

Now consider this instruction sequence: ADD R1, R2, #5 SUB R3, R1, #2

The SUB depends on the result of the previous ADD being complete. The ADD result isn’t ready in time, causing a data hazard stall of 2 cycles. So the total is 5 cycles.

3. Branch Prediction

For conditional branches, the M3 uses static branch prediction. It predicts backward branches to be taken, and forward branches not taken. If mispredicted, the pipeline must be flushed and refilled, incurring a penalty. Predicting branches well is crucial for performance.

Conclusion

The 3-stage pipeline in Arm Cortex-M processors provides significant performance gains over non-pipelined execution, approximately doubling or tripling instruction throughput. Proper pipelining techniques like hazard detection and branch prediction help minimize stalls to maximize utilization. Multiple pipelines and out-of-order execution provide further gains in more advanced processors.

Understanding pipelining is key to designing software optimized for Cortex-M and exploring the capabilities of these ubiquitous processors.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Does arm cortex-M4 have stages of pipeline?
Next Article Understanding Pipeline Hazards in Cortex-M4 Microcontrollers
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Is there a cache in the ARM Cortex-M4?

The short answer is yes, the ARM Cortex-M4 processor does…

9 Min Read

Is ARM Assembly Language Hard?

ARM assembly language is considered moderately difficult to learn and…

5 Min Read

ARM Cortex M Assembly Tutorial

Assembly language is a low-level programming language that directly corresponds…

10 Min Read

What is SysTick 24-bit timer in Arm Cortex-M series?

The SysTick timer is a 24-bit down counter built into…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account