SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: How Instructions are Fetched in Cortex M Processors?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

How Instructions are Fetched in Cortex M Processors?

Javier Massey
Last updated: September 14, 2023 8:07 am
Javier Massey 7 Min Read
Share
SHARE

Overview of Instruction Fetch in Cortex M

The IFU fetches instructions from memory and feeds them to the execution pipeline. It contains a prefetch buffer that caches instructions. The prefetch buffer helps reduce stalls when fetching from slower memories. The IFU also handles branching by fetching instructions from the new branch target address.

Contents
Overview of Instruction Fetch in Cortex MPrefetch Buffer and FetchingBranch Target FetchingInterrupt and Exception FetchingReset BehaviorBus Interface UnitCortex M3/M4/M7 – MicroarchitectureSummary

In Cortex M processors, the IFU fetches instructions in a linear fashion from consecutive memory addresses. Branch targets and exception entries use literal addresses rather than a branch predictor. This simplified approach reduces cost and power consumption.

Prefetch Buffer and Fetching

The prefetch buffer in Cortex M IFU caches instructions fetched from memory. It helps smooth out instruction execution when fetching from slower memories. The prefetch buffer size varies across Cortex M processors. Higher end M7 and M33 have 8-12 prefetch entries, while lower end M0+, M3, M4 have 2-3 entries.

At any time, the IFU tries to keep the prefetch buffer filled with instructions from sequential memory addresses. As the processor decodes instructions from the buffer, the IFU reads ahead and grabs the next set from memory. This helps avoid stalls while waiting for instructions.

The IFU fetches instructions in 32-bit aligned words from memory. It packages instructions from consecutive addresses into a single fetch. For compressed 16-bit Thumb instructions, 16-bit fetches are used. The fetched instructions are buffered and decoded before execution.

Branch Target Fetching

When a branch instruction is executed, the IFU must fetch instructions from the new target address rather than the next sequential address. Branch targets are handled by updating the address in the IFU’s instruction address register.

Unconditional branches simply set the new literal target address. Conditional branches set the address but it only takes effect if the condition passes. Otherwise it continues from the next instruction.

Branch instructions go through the normal decode and execute stages. The new target address is available by the time the branch instruction finishes. So the IFU can seamlessly switch to fetching from the target.

In Cortex M processors, branches use literal target addresses rather than branch prediction. While branch prediction can improve performance, it also increases cost and power usage. The literal target approach is simpler and more power efficient.

Interrupt and Exception Fetching

Interrupts and exceptions trigger a branch to handle the event. When an interrupt or exception occurs, the processor first saves its current state. Then the IFU fetches the exception handler instruction from a literal address in the vector table.

The vector table is located in a fixed region of memory and contains the entry points for each type of exception. The processor simply provides the exception number to index into the table and retrieve the target address.

After executing the exception handler, program flow returns to the original point using special instructions to restore state. The IFU handles this by resuming fetch from the return address rather than the next sequential address.

Reset Behavior

On reset, Cortex M processors begin fetching instructions from a fixed reset vector address. This address points to the initial startup code that configures the system. The processor resets the IFU to start fetching from this literal address.

During reset sequence, the IFU may only be able to perform a single fetch at a time until the bus and memory are initialized. Resetting the device triggers an exception-like behavior to start executing code at the reset vector.

Bus Interface Unit

While the IFU determines the sequence of instruction fetches, the Bus Interface Unit (BIU) handles the actual reading from memory. The BIU connects the core to the system bus and memory. It issues bus read requests to fetch instructions.

The BIU contains hardware for interfacing with the bus protocol like AHB or AXI. For each fetch, it requests a 32-bit aligned word from the memory system. The BIU also includes buffers to smooth transfers between the bus and the core.

If the bus is busy when the IFU requests a fetch, the BIU will insert wait states. This stalls the IFU until the data becomes available. The prefetch buffer helps minimize the performance impact of such stalls.

Cortex M3/M4/M7 – Microarchitecture

As an example, the Cortex M3, M4 and M7 implement dual instruction prefetch buffers of differing sizes. Additional logic allows loading ahead while separating instruction fetching and decoding.

The M3 and M4 IFUs contain:

  • 2 x 16-byte prefetch buffers
  • Incrementer to generate sequential fetch addresses
  • MUX to select between branch target and incremented addresses
  • Stall logic to handle bus wait states

The M7 IFU improves performance with:

  • 2 x 64-bit prefetch buffers totaling 12 instructions
  • Branch target buffer queue to reduce branch penalty
  • Instruction stream signature analyzer for optimizations

In all cases, the IFU controls fetching the instruction stream while the BIU physically reads the data over the bus. The prefetches smooth out execution, while literal branch targets avoid complex prediction logic.

Summary

In summary, instruction fetching in Cortex M processors uses a streamlined approach focused on embedded applications:

  • The IFU fetches instructions sequentially utilizing a prefetch buffer
  • Branches use literal target addresses rather than prediction
  • Exceptions and interrupts cause branches to fixed handler addresses
  • The BIU interfaces the core to the system bus and memory
  • A simplified microarchitecture reduces cost and power

Overall, the Cortex M instruction fetch architecture balances performance and efficiency for embedded systems.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article ARM Cortex M NonMaskable Interrupt is NonClearable also?
Next Article Does ARM assume that all Cortex-M microcontrollers are little-endian?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Memory Mapped Peripheral Register Access with ARM Cortex-M1

The ARM Cortex-M1 processor provides a simple and efficient way…

9 Min Read

What is the ARM Calling Convention?

The ARM calling convention refers to the standard procedure used…

9 Min Read

What is the difference between ARM Cortex-R and M series?

The main differences between ARM Cortex-R and M series processors…

5 Min Read

Options for Acquiring Cortex-M1 and M0 Soft Cores

There are a few options available for acquiring Cortex-M1 and…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account