What is Instruction TCM (ITCM) Memory in Arm Cortex-M series?

Instruction TCM (ITCM) is a small, fast memory region that is located within the Cortex-M processor itself. It allows frequently used code to be stored on-chip for quick access, improving performance by reducing the need to fetch instructions from slower off-chip memory.

Contents

Overview of ITCM ITCM Implementation Using ITCM 1. Reserve ITCM Memory 2. Place Code in ITCM Memory 3. Initialize ITCM Performance Impact Tradeoffs of Using ITCM Alternatives to ITCM Conclusion

Overview of ITCM

The Cortex-M series of processors from Arm are 32-bit RISC processors designed for microcontroller applications. They feature a simplified architecture optimized for low cost and power efficiency.

One key performance optimization in Cortex-M processors is the use of tight coupling memories (TCMs). TCMs are small RAM blocks that reside in the processor itself, alongside the CPU core and caches. There are two types of TCM:

Instruction TCM (ITCM) – stores executable code
Data TCM (DTCM) – stores data

Because TCMs are integrated into the processor, access latency is much lower compared to external memories. This allows frequently used code and data to be accessed very quickly.

ITCM specifically is used to hold time-critical program code, such as interrupt handlers, inner loops, and performance-sensitive algorithms. Storing this code in fast on-chip RAM improves execution speed by reducing stalls associated with fetching instructions from slower off-chip flash or RAM.

ITCM Implementation

The implementation of ITCM differs across Cortex-M variants:

Cortex-M0/M0+ – No ITCM

Cortex-M1 – 8KB ITCM
Cortex-M3 – No ITCM
Cortex-M4 – Up to 64KB ITCM

Cortex-M7 – Up to 64KB ITCM
Cortex-M23/M33 – Up to 64KB ITCM

As shown above, ITCM capacity ranges from 0KB to 64KB across different Cortex-M variants. The presence and size of ITCM is chosen by the silicon vendor when designing a Cortex-M based microcontroller.

For example, the STM32F407 MCU has a 32KB ITCM, while the NXP LPC1769 MCU has no ITCM. On processors with no ITCM, all code must be fetched from external flash or RAM, which is slower.

Using ITCM

To make the most of the ITCM’s speed, time-critical code needs to be specifically placed into the ITCM memory region. The steps to utilize ITCM are:

Reserve ITCM memory – When configuring the linker script, reserve a block of memory for the ITCM.

Place code in ITCM memory – Use directives to assign specific functions/code to the ITCM memory region.
Initialize ITCM – Code must be copied from flash into ITCM at runtime, before execution.

Let’s look at each of these steps in more detail:

1. Reserve ITCM Memory

In the linker script, a region of memory needs to be reserved for the ITCM. For example: MEMORY { /* ITCM RAM */ ITCM_RAM : ORIGIN = 0x00000000, LENGTH = 32K }

This reserves the memory region 0x00000000 to 0x00007FFF for ITCM usage.

2. Place Code in ITCM Memory

Compiler directives are used to place specific functions/code into the ITCM memory region in the linker script. For example: __attribute__((section(“.itcm”))) void foo() { // Function code }

This GCC attribute places the function foo() in the .itcm section, which maps to the ITCM memory region.

3. Initialize ITCM

At runtime, code needs to be copied from slower external flash into the ITCM before it can be executed there. This is done using a function like: void init_itcm() { /* Copy functions into ITCM */ memcpy(&itcm, &flash, &size); }

This copies code from flash into the ITCM region. The init_itcm() function must be called once at startup before any ITCM resident code is executed.

Performance Impact

Placing key routines like interrupt handlers and inner loops into ITCM can provide significant performance improvements. Some examples:

Interrupt latency can be reduced from tens of cycles to just a few cycles.
Fetching an instruction from ITCM takes 1-3 cycles, versus >100 cycles from flash.

Loop execution time can be reduced by 15% or more by placing the loop in ITCM.

However, the performance gains depend on a number of factors:

Code size vs ITCM size – Gains diminish if too much code is placed in ITCM.

Memory system – Faster external memory reduces gains from using ITCM.
Instruction mix – Code with more loads/branches benefits more.

So while ITCM can improve performance, overuse can have negative effects. Profile your application code carefully to choose what code to place in ITCM.

Tradeoffs of Using ITCM

There are some downsides to using ITCM that must be considered:

Complexity – Managing an additional memory region complicates development.
Runtime initialization – Code must be copied into ITCM, increasing startup time.

Memory fragmentation – ITCM adds another memory region, increasing fragmentation.
Cost – Adding ITCM increases processor silicon area and cost.

Therefore, ITCM should only be used for selective performance-critical routines. Entire applications should not be placed in ITCM.

Alternatives to ITCM

Other options for improving performance besides ITCM include:

Faster external memory like RAM or burst flash
Code compression/decompression to reduce instruction fetches

Caching flash content in faster memory
Prefetching flash content into cache

Many Cortex-M microcontrollers also have data TCM (DTCM) for data storage, as well as instruction and data caches. These also help accelerate memory access.

Conclusion

To summarize, Instruction TCM provides fast on-chip storage for time-critical code on Cortex-M processors. Careful use of ITCM can reduce access latency and improve performance. However, the gains depend on the application and added complexity must be managed. In many cases, faster external memories or caches may be a better solution than ITCM.

What is Instruction TCM (ITCM) Memory in Arm Cortex-M series?

Overview of ITCM

ITCM Implementation

Using ITCM

1. Reserve ITCM Memory

2. Place Code in ITCM Memory

3. Initialize ITCM

Performance Impact

Tradeoffs of Using ITCM

Alternatives to ITCM

Conclusion

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

SysTick interrupt doesn’t trigger (Arm Cortex M0)

What registers to save in the ARM C calling convention?

What are the 3 Major Components in ARM Processors?

Tips on implementing Cortex-M1 on Artix-7 and Spartan-7