Instruction TCM (ITCM) is a small, fast memory region that is located within the Cortex-M processor itself. It allows frequently used code to be stored on-chip for quick access, improving performance by reducing the need to fetch instructions from slower off-chip memory.
Overview of ITCM
The Cortex-M series of processors from Arm are 32-bit RISC processors designed for microcontroller applications. They feature a simplified architecture optimized for low cost and power efficiency.
One key performance optimization in Cortex-M processors is the use of tight coupling memories (TCMs). TCMs are small RAM blocks that reside in the processor itself, alongside the CPU core and caches. There are two types of TCM:
- Instruction TCM (ITCM) – stores executable code
- Data TCM (DTCM) – stores data
Because TCMs are integrated into the processor, access latency is much lower compared to external memories. This allows frequently used code and data to be accessed very quickly.
ITCM specifically is used to hold time-critical program code, such as interrupt handlers, inner loops, and performance-sensitive algorithms. Storing this code in fast on-chip RAM improves execution speed by reducing stalls associated with fetching instructions from slower off-chip flash or RAM.
ITCM Implementation
The implementation of ITCM differs across Cortex-M variants:
- Cortex-M0/M0+ – No ITCM
- Cortex-M1 – 8KB ITCM
- Cortex-M3 – No ITCM
- Cortex-M4 – Up to 64KB ITCM
- Cortex-M7 – Up to 64KB ITCM
- Cortex-M23/M33 – Up to 64KB ITCM
As shown above, ITCM capacity ranges from 0KB to 64KB across different Cortex-M variants. The presence and size of ITCM is chosen by the silicon vendor when designing a Cortex-M based microcontroller.
For example, the STM32F407 MCU has a 32KB ITCM, while the NXP LPC1769 MCU has no ITCM. On processors with no ITCM, all code must be fetched from external flash or RAM, which is slower.
Using ITCM
To make the most of the ITCM’s speed, time-critical code needs to be specifically placed into the ITCM memory region. The steps to utilize ITCM are:
- Reserve ITCM memory – When configuring the linker script, reserve a block of memory for the ITCM.
- Place code in ITCM memory – Use directives to assign specific functions/code to the ITCM memory region.
- Initialize ITCM – Code must be copied from flash into ITCM at runtime, before execution.
Let’s look at each of these steps in more detail:
1. Reserve ITCM Memory
In the linker script, a region of memory needs to be reserved for the ITCM. For example: MEMORY { /* ITCM RAM */ ITCM_RAM : ORIGIN = 0x00000000, LENGTH = 32K }
This reserves the memory region 0x00000000 to 0x00007FFF for ITCM usage.
2. Place Code in ITCM Memory
Compiler directives are used to place specific functions/code into the ITCM memory region in the linker script. For example: __attribute__((section(“.itcm”))) void foo() { // Function code }
This GCC attribute places the function foo() in the .itcm section, which maps to the ITCM memory region.
3. Initialize ITCM
At runtime, code needs to be copied from slower external flash into the ITCM before it can be executed there. This is done using a function like: void init_itcm() { /* Copy functions into ITCM */ memcpy(&itcm, &flash, &size); }
This copies code from flash into the ITCM region. The init_itcm() function must be called once at startup before any ITCM resident code is executed.
Performance Impact
Placing key routines like interrupt handlers and inner loops into ITCM can provide significant performance improvements. Some examples:
- Interrupt latency can be reduced from tens of cycles to just a few cycles.
- Fetching an instruction from ITCM takes 1-3 cycles, versus >100 cycles from flash.
- Loop execution time can be reduced by 15% or more by placing the loop in ITCM.
However, the performance gains depend on a number of factors:
- Code size vs ITCM size – Gains diminish if too much code is placed in ITCM.
- Memory system – Faster external memory reduces gains from using ITCM.
- Instruction mix – Code with more loads/branches benefits more.
So while ITCM can improve performance, overuse can have negative effects. Profile your application code carefully to choose what code to place in ITCM.
Tradeoffs of Using ITCM
There are some downsides to using ITCM that must be considered:
- Complexity – Managing an additional memory region complicates development.
- Runtime initialization – Code must be copied into ITCM, increasing startup time.
- Memory fragmentation – ITCM adds another memory region, increasing fragmentation.
- Cost – Adding ITCM increases processor silicon area and cost.
Therefore, ITCM should only be used for selective performance-critical routines. Entire applications should not be placed in ITCM.
Alternatives to ITCM
Other options for improving performance besides ITCM include:
- Faster external memory like RAM or burst flash
- Code compression/decompression to reduce instruction fetches
- Caching flash content in faster memory
- Prefetching flash content into cache
Many Cortex-M microcontrollers also have data TCM (DTCM) for data storage, as well as instruction and data caches. These also help accelerate memory access.
Conclusion
To summarize, Instruction TCM provides fast on-chip storage for time-critical code on Cortex-M processors. Careful use of ITCM can reduce access latency and improve performance. However, the gains depend on the application and added complexity must be managed. In many cases, faster external memories or caches may be a better solution than ITCM.