Data TCM (DTCM) is a small, fast memory located inside the Cortex-M processor that can be used for time-critical data accesses. It allows frequently accessed data to be stored close to the processor, reducing access latency and improving performance.
Overview of Tightly Coupled Memories in Cortex-M
The Cortex-M series of processors contain small memories called Tightly Coupled Memories (TCMs) that are integrated into the processor pipeline. There are two types of TCM:
- Instruction TCM (ITCM) – Stores program instructions
- Data TCM (DTCM) – Stores data
The key features of TCMs are:
- Small size – Up to 64KB for ITCM, up to 64KB for DTCM
- Very low access latency – Single cycle access, compared to tens of cycles for external memory
- Dedicated bus – TCMs have a dedicated bus to the Cortex-M core for high bandwidth
- Optional – Use of TCMs is optional, they supplement external memories
Because TCMs are so fast, time critical code and data can be placed in them to improve real-time performance. However, their small size means they can’t hold an entire program or dataset, so they are used judiciously.
DTCM Use Cases
Here are some common uses of DTCM in Cortex-M designs:
Storing Time Critical Data
DTCM is ideal for data that needs very fast access times. Examples include:
- Real-time control loop data
- Sensor data that is sampled very frequently
- Buffers for time critical peripheral data like ADC samples or radio packets
- Critical system data structures
Storing this kind of data in DTCM instead of external RAM can improve performance and responsiveness.
Sharing Data Between Interrupts and Tasks
DTCM can be used as a sharing buffer when different execution contexts need access to the same data. For example:
- A real-time interrupt stores sensor samples in a DTCM buffer
- A lower priority task periodically processes the samples from the same buffer
This avoids external memory accesses and complex synchronization logic.
Stack for Real-Time Interrupts
The stack for real-time interrupt service routines can be allocated in DTCM. This reduces latency when entering and exiting the ISR.
Holding Time Critical Code
Although DTCM is intended for data, it can also hold time critical code fragments that need fast access. For example:
- Performance critical math functions
- Tuning parameters for compute intensive algorithms
This code can be executed directly from DTCM to improve performance.
Configuring DTCM in Cortex-M Devices
Here are some key points about configuring DTCM in Cortex-M devices:
- DTCM size is fixed in hardware – Up to 64KB in Cortex-M7 and M4, up to 32KB in M3 and below
- DTCM is optional – can be disabled to save power if not needed
- DTCM can be split into multiple smaller regions
- DTCM regions can be aligned to different addresses
- DTCM can be configured as general purpose RAM or system RAM
- Access permissions can restrict core vs peripheral access
- Hardware configuration is done via the Memory Protection Unit (MPU)
The specific configuration options depend on the Cortex-M implementation in the chip. Configuration is done via vendor provided tools or initialization code.
DTCM vs External RAM
Compared to external RAM, DTCM provides:
- Faster access – single cycle vs tens of cycles latency
- More predictable access time – no contention with peripherals
- Higher priority – dedicated bus instead of sharing with DMA/CPU
However, DTCM capacity is very small compared to external RAM. So it is used only for the most timing critical data. The majority of data still resides in slower external RAM.
DTCM vs ITCM
DTCM and ITCM are similar in nature – fast, tightly coupled processor memories. But there are some key differences:
- DTCM stores data, ITCM stores instructions
- ITCM access is read-only, DTCM can be read/written
- DTCM can be used as general purpose RAM
- Data accesses exhibit more locality than instruction accesses
In summary, DTCM is specialized for low latency data accesses, while ITCM is optimized for instruction fetch.
DTCM Performance Factors
There are several factors that determine how much performance benefit DTCM provides compared to external memory:
- Access frequency – DTCM helps most for data that is accessed many times
- Latency delta – Bigger improvement if external memory latency is high
- Bus contention – DTCM isolated from congested system bus
- Cache usage – Improvements reduced if external RAM cached
- Data locality – DTCM only fits localized data, doesn’t help nonlocal accesses
So DTCM gives the biggest boost for frequently accessed, localized data, with high external memory latency and bus contention. Caching and non-localized data reduce the impact.
DTCM Limitations
Some limitations to keep in mind when using DTCM:
- Very small capacity – even 64KB is tiny by modern standards
- Manual data management – no automated allocation like heap
- Data transfers – getting data in/out of DTCM has overhead
- Memory fragmentation concerns due to small size
- Limited or no DMA access – peripherals may not be able to access DTCM
So DTCM improves performance but requires careful design to work within its limitations.
Conclusion
In summary, Data TCM is a fast, tightly coupled memory in Cortex-M processors that allows time critical data to be accessed with low, consistent latency. It works best for frequently used data that fits in its relatively small capacity. With careful design, DTCM can greatly improve determinism and real-time performance in Cortex-M systems.