SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: What is Instruction TCM (ITCM) Memory in Arm Cortex-M series?
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

What is Instruction TCM (ITCM) Memory in Arm Cortex-M series?

Scott Allen
Last updated: September 17, 2023 1:08 pm
Scott Allen 7 Min Read
Share
SHARE

Instruction TCM (ITCM) is a small, fast memory region that is located within the Cortex-M processor itself. It allows frequently used code to be stored on-chip for quick access, improving performance by reducing the need to fetch instructions from slower off-chip memory.

Contents
Overview of ITCMITCM ImplementationUsing ITCM1. Reserve ITCM Memory2. Place Code in ITCM Memory3. Initialize ITCMPerformance ImpactTradeoffs of Using ITCMAlternatives to ITCMConclusion

Overview of ITCM

The Cortex-M series of processors from Arm are 32-bit RISC processors designed for microcontroller applications. They feature a simplified architecture optimized for low cost and power efficiency.

One key performance optimization in Cortex-M processors is the use of tight coupling memories (TCMs). TCMs are small RAM blocks that reside in the processor itself, alongside the CPU core and caches. There are two types of TCM:

  • Instruction TCM (ITCM) – stores executable code
  • Data TCM (DTCM) – stores data

Because TCMs are integrated into the processor, access latency is much lower compared to external memories. This allows frequently used code and data to be accessed very quickly.

ITCM specifically is used to hold time-critical program code, such as interrupt handlers, inner loops, and performance-sensitive algorithms. Storing this code in fast on-chip RAM improves execution speed by reducing stalls associated with fetching instructions from slower off-chip flash or RAM.

ITCM Implementation

The implementation of ITCM differs across Cortex-M variants:

  • Cortex-M0/M0+ – No ITCM
  • Cortex-M1 – 8KB ITCM
  • Cortex-M3 – No ITCM
  • Cortex-M4 – Up to 64KB ITCM
  • Cortex-M7 – Up to 64KB ITCM
  • Cortex-M23/M33 – Up to 64KB ITCM

As shown above, ITCM capacity ranges from 0KB to 64KB across different Cortex-M variants. The presence and size of ITCM is chosen by the silicon vendor when designing a Cortex-M based microcontroller.

For example, the STM32F407 MCU has a 32KB ITCM, while the NXP LPC1769 MCU has no ITCM. On processors with no ITCM, all code must be fetched from external flash or RAM, which is slower.

Using ITCM

To make the most of the ITCM’s speed, time-critical code needs to be specifically placed into the ITCM memory region. The steps to utilize ITCM are:

  1. Reserve ITCM memory – When configuring the linker script, reserve a block of memory for the ITCM.
  2. Place code in ITCM memory – Use directives to assign specific functions/code to the ITCM memory region.
  3. Initialize ITCM – Code must be copied from flash into ITCM at runtime, before execution.

Let’s look at each of these steps in more detail:

1. Reserve ITCM Memory

In the linker script, a region of memory needs to be reserved for the ITCM. For example: MEMORY { /* ITCM RAM */ ITCM_RAM : ORIGIN = 0x00000000, LENGTH = 32K }

This reserves the memory region 0x00000000 to 0x00007FFF for ITCM usage.

2. Place Code in ITCM Memory

Compiler directives are used to place specific functions/code into the ITCM memory region in the linker script. For example: __attribute__((section(“.itcm”))) void foo() { // Function code }

This GCC attribute places the function foo() in the .itcm section, which maps to the ITCM memory region.

3. Initialize ITCM

At runtime, code needs to be copied from slower external flash into the ITCM before it can be executed there. This is done using a function like: void init_itcm() { /* Copy functions into ITCM */ memcpy(&itcm, &flash, &size); }

This copies code from flash into the ITCM region. The init_itcm() function must be called once at startup before any ITCM resident code is executed.

Performance Impact

Placing key routines like interrupt handlers and inner loops into ITCM can provide significant performance improvements. Some examples:

  • Interrupt latency can be reduced from tens of cycles to just a few cycles.
  • Fetching an instruction from ITCM takes 1-3 cycles, versus >100 cycles from flash.
  • Loop execution time can be reduced by 15% or more by placing the loop in ITCM.

However, the performance gains depend on a number of factors:

  • Code size vs ITCM size – Gains diminish if too much code is placed in ITCM.
  • Memory system – Faster external memory reduces gains from using ITCM.
  • Instruction mix – Code with more loads/branches benefits more.

So while ITCM can improve performance, overuse can have negative effects. Profile your application code carefully to choose what code to place in ITCM.

Tradeoffs of Using ITCM

There are some downsides to using ITCM that must be considered:

  • Complexity – Managing an additional memory region complicates development.
  • Runtime initialization – Code must be copied into ITCM, increasing startup time.
  • Memory fragmentation – ITCM adds another memory region, increasing fragmentation.
  • Cost – Adding ITCM increases processor silicon area and cost.

Therefore, ITCM should only be used for selective performance-critical routines. Entire applications should not be placed in ITCM.

Alternatives to ITCM

Other options for improving performance besides ITCM include:

  • Faster external memory like RAM or burst flash
  • Code compression/decompression to reduce instruction fetches
  • Caching flash content in faster memory
  • Prefetching flash content into cache

Many Cortex-M microcontrollers also have data TCM (DTCM) for data storage, as well as instruction and data caches. These also help accelerate memory access.

Conclusion

To summarize, Instruction TCM provides fast on-chip storage for time-critical code on Cortex-M processors. Careful use of ITCM can reduce access latency and improve performance. However, the gains depend on the application and added complexity must be managed. In many cases, faster external memories or caches may be a better solution than ITCM.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article What is Data Cache in Arm Cortex-M series?
Next Article What is Data TCM (DTCM) Memory in Arm Cortex-M series?
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

Memory Map Configuration for Cortex-M1 in Pynq

Configuring the memory map for a Cortex-M1 processor in a…

8 Min Read

Is Neon available with Cortex-M or Cortex-A series?

The short answer is no, ARM's Neon SIMD instruction set…

6 Min Read

What is arm usage fault?

An arm usage fault is an exception that occurs when…

10 Min Read

What is Thumb instruction set in ARM Cortex M3 processor?

The Thumb instruction set is a compressed 16-bit instruction set…

6 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account