The Instrumentation Trace Macrocell (ITM) is a tracing and debugging feature in Arm Cortex-M series processors. It provides a way to send profiling and diagnostic information from the Cortex-M CPU to a trace port on the processor. This allows for real-time tracing of software execution without halting or interrupting the processor.
Overview of ITM
The ITM is a parallel port that is separate from the processor’s system bus. It does not affect code execution or slow down the processor. The ITM has stimulus ports that software can write data to. This data is then packetized and sent out the trace port. A trace probe or debugger can be connected to the trace port to receive and decode the tracing data.
The key capabilities of ITM include:
- Tracing program flow and branches
- Tracing function calls and returns
- Generating profiling timestamps
- Tracing printf-style debugging messages
- Tracing custom instrumentation data
This provides very detailed visibility into software execution without intrusively stopping or slowing code. The tracing data can be used to analyze and debug issues as well as profile and optimize software performance.
ITM Architecture
The ITM consists of stimulus ports, local trace buffers, formatting logic, and the trace bus to the external trace port. There are 256 stimulus ports available, numbered 0-255. Each port is 32 bits wide. The stimulus ports are memory mapped registers that software running on the Cortex-M can write data to.
Attached to each stimulus port is a local trace buffer. This buffers tracing data before it gets sent out the trace port. Having a buffer allows the ITM to collect trace data even if the processor is running faster than the trace port bandwidth. The buffers help prevent loss of tracing data.
The Formatting and Control logic takes data from the stimulus ports and formats it into packets to send out the trace bus. It also controls buffer management and prioritization of tracing data. The trace bus runs at the fixed speed of the trace clock. This carries the trace packets to an external trace port on the Cortex-M system.
A Debug Access Port (DAP) is used to configure and control the ITM module. This includes enabling tracing, setting timestamps, and accessing stimulus port registers.
ITM Stimulus Ports
The 256 stimulus ports provide the interface for software to generate tracing data. They act as simple registers that code can write bytes to. Writes to the stimulus ports get buffered and packetized to send out the trace port. There are different types of stimulus ports:
- Instrumentation ports – General purpose ports for tracing data like printf messages, function calls, profiling timestamps, etc.
- Reserved ports – Special ports used by ARM, such as for tracing exceptions.
- Hardware source ports – Connected directly to hw sources like DWT or ETM blocks for hw tracing.
The most common stimulus ports used are:
- Port 0 – Used for printf-style messages. Writes are formatted as ASCII output.
- Port 31 – Generates profiling timestamps.
Other ports can be used to trace custom debug events, function calls, instrumented data, branching, and more. The ports provide a simple way to get any data out the trace port from firmware.
ITM Tracing Data Packets
The ITM formatting logic takes data from the stimulus ports and formats it into packets for the trace bus. There are different types of trace packets:
- Source packets – Contain data from a stimulus port write.
- Timestamp packets – Timestamps generated on port 31 writes.
- Trigger packets – Mark execution triggers from hw debugging blocks.
- Overflow packets – Indicate trace buffer overflow.
Source packets contain a header with the stimulus port number and data length. This allows tracing data to be correlated back to the correct source port. The data payload follows in the packet.
Timestamp packets include a 64-bit timestamp value. This comes from a counter connected to port 31. It provides timing information that can be used to analyze program execution performance.
ITM Trace Port
The ITM trace port sends out the trace packets from the formatting logic. This port runs at a fixed speed set by the trace clock. Some key characteristics:
- Parallel port of varying width (4 to 64 bits)
- Fixed clock speed (System clock or separate trace clock)
- Asynchronous to processor system bus
- Flow control signals
The trace port width determines the maximum bandwidth for tracing data. Wider ports allow more data to be traced in real-time but require more pins. The trace clock sets the bit rate for the port. This clock can be independent of the system clock to avoid limiting system speed.
The trace port has flow control signals to indicate if tracing data is being captured by an external tool. If not, the ITM can stop sending data to avoid overflowing its buffers. This helps optimize tracing performance.
Connecting to the ITM Trace Port
To capture and decode ITM trace data, an external tool needs to connect to the Cortex-M trace port. There are a few options for this:
- Debug probe – A debug adapter like J-Link or ST-Link can connect to the trace port and stream data to a PC.
- Logic analyzer – A logic analyzer or oscilloscope can directly capture trace port signals.
- Embedded trace buffer – An on-chip trace buffer can capture trace data for later offload.
A debug probe that supports SWO/SWV provides the easiest way to get trace data into debugging tools. The probe connects to the trace port and streams the data over USB or Ethernet to a debugger. This lets you view and analyze the ITM data on a PC.
A logic analyzer can also directly capture ITM trace port output for analysis. It requires manually decoding the trace packet protocol though. On-chip trace buffering records ITM data internally for later readout through the debug probe interface.
ITM Configuration
To use the ITM, it must first be configured via the Cortex-M Debug Access Port (DAP). Key configuration settings include:
- Enabling tracing – The ITM Control Register is used to enable tracing.
- Trace clock – Set the trace port clock speed.
- Trace port width – Select the width of the trace port (4 to 64 bits).
- Timestamp prescaler – Configure timestamp frequency generated on port 31.
The stimulus port registers also need to be write-enabled before code can generate trace data through them. By default they start disabled until configured.
ARM’s CoreSight debugger includes ITM configuration options to handle all this initialization. Configure it with the desired port width, trace clock speed, and timestamp frequency. The debugger will take care of the rest to enable tracing.
ITM Trace Buffer Management
As mentioned earlier, each ITM stimulus port has a small local trace buffer. But the ITM also contains a global trace buffer that all ports can use:
- Stimulus Buffer – Circular buffer per stimulus port (8 words each).
- Global Trace Buffer – Larger shared buffer (1000s of words).
The per-port stimulus buffers are very small, designed to handle high-speed local bursting. The shared global buffer provides more storage to handle cases where the trace port bandwidth is momentarily exceeded.
The ITM manages these buffers carefully to try to avoid dropping trace data. It tracks occupancy and adapts buffer allocation accordingly. It also signals backpressure to the Cortex-M when the buffers start to fill up. This causes the processor to stall stimulus port writes until space frees up.
If buffers do overflow, the ITM records this by sending overflow packets out the trace port. The overflow location can be used to determine the tracing data that was lost.
ITM Trace Decoder
To make sense of the raw ITM trace port packets, a trace decoder is needed. The packets must be parsed to extract the trace source data. There are a few options for decoding ITM data:
- Debugger – Debug probes include built-in decoding software to interpret ITM packets.
- Parser Library – Use a parser library in your own code to decode packets.
- Decoder Hardware – Use a hardware module to decode and display traces.
Most full-featured debuggers like ARM Keil MDK and Eclipse plugins can receive and decode ITM data from a debug probe. This gives you a high level view of the trace data within the debugging environment.
For custom handling of trace data, you can use a packet decoder library. ARM CoreSight provides an ITM decoder library for parsing the packets. This allows creating your own trace viewing utilities.
Some solutions also include external decoder hardware that can directly display ITM trace data. This simplifies viewing traces without needing a PC.
ITM Trace Viewer
The trace decoder software will usually include or interface with a trace viewer application. This displays the ITM data in a user-friendly format, similar to how a debugger displays program variables and registers.
A typical ITM trace viewer shows:
- Stimulus Port Output – Data logged from each stimulus port.
- Timestamps – Markers inserted at port writes or on overflow.
- Program Flow – Execution sequence reconstructed from trace.
The trace viewer may also provide search, filtering, and data export capabilities to further analyze traces. This helps you zero in on interesting events and save them for reporting.
ITM Use Cases
Here are some common use cases for Instrumentation Trace Macrocell tracing:
- Timing analysis – Profile code execution using timestamps to identify optimizations.
- Program flow debugging – Trace function calls, returns, and branches to see code execution flow.
- printf debugging – Insert trace print statements for debugging without halting processor.
- Error diagnosis – Log error conditions to identify fault sources.
- Data monitoring – Trace instrumented variables over time to visualize behavior.
ITM tracing provides non-intrusive visibility into software execution. It enables real-time debugging and profiling use cases. The high-bandwidth trace port can capture large amounts of tracing data for in-depth diagnostics.
ITM vs Other ARM Trace Options
In addition to ITM, there are a couple other trace technologies used in ARM Cortex-M devices:
- ETM – Embedded Trace Macrocell, used for instruction and program flow tracing.
- DWT – Data Watchpoint and Trace, used to trace data accesses.
ETM provides comprehensive instruction tracing, tracing each instruction executed and all branches. It has high trace bandwidth through a dedicated trace port. But ETM is more complex and costly than ITM.
DWT monitors data accesses and can trace them through the ITM stimulus ports. DWT combined with ITM instrumentation provides full tracing of code execution and data.
ITM offers a simpler and lower cost tracing option focused on instrumented software tracing. It has more flexibility than ETM or DWT for tracing data, with support for printf-style debugging. The main downside is ITM does not reconstruct full program flow without added instrumentation code.
Conclusion
In summary, the Instrumentation Trace Macrocell is a key debugging feature of Cortex-M processors. It provides real-time tracing of instrumented software data through a high-speed parallel trace port. The ITM enables use cases like debugging printf logs, execution profiling, data monitoring, and diagnosing errors in the field. It provides efficient non-intrusive tracing without halting the processor. With full configurations, trace bandwidth exceeds 100Mbit/s enabling tracing of large complex software applications.
ITM tracing is an invaluable capability for developing, debugging, and optimizing software on Arm Cortex-M based microcontrollers. The instrumentation ports, trace packets, trace decoder, and viewer software come together to give unprecedented visibility into executing embedded software.
Hi David, do you know by any chance how would the System Trace Macrocell (STM) and the Trace Memory Controller (TMC) compare with the ETM/ITM/DWT?
Hello, thanks for the question. The System Trace Macrocell (STM) and Trace Memory Controller (TMC) serve a similar role to ETM/ITM/DWT for tracing Arm Cortex-M software execution. The key differences are:
STM/TMC provides an alternative tracing approach using on-chip trace memory and compression rather than a high-speed parallel trace port. This avoids some external debug complexity. The downside is limited on-chip storage and bandwidth for trace data. The other architectures stream trace externally enabling capture of very large program traces. Each approach has tradeoffs to consider based on debugging needs and hardware cost