ETM FIFOFULL Overload in High-Speed Cortex-M7 and M4 Systems

The Embedded Trace Macrocell (ETM) is a critical component for real-time debugging and performance analysis in ARM Cortex-M processors. However, in high-speed systems such as those utilizing Cortex-M7 cores running at 1GHz or Cortex-M4 cores at 400MHz, the ETM can become overwhelmed, leading to FIFOFULL events. These events occur when the trace buffer fills up faster than it can be emptied, rendering the trace data useless. This issue is exacerbated when the system employs Tightly Coupled Memory (TCM) for both instruction (ITCM) and data (DTCM), as well as when using high-speed peripherals like DMA and network stacks (e.g., lwIP). The core’s event generation rate far exceeds the ETM’s event consumption rate, leading to a bottleneck that severely impacts debugging capabilities.

The FIFOFULL issue is particularly problematic in systems running real-time operating systems (RTOS) like FreeRTOS, where idle cycles and context switches generate a high volume of trace packets. Additionally, tracing specific data structures, such as a 16-byte structure updated every 125 microseconds, can overwhelm the ETM even further. Reducing the core clock speed to as low as 100MHz—one-tenth of the nominal operating frequency—can mitigate the issue, but this is not a practical solution for high-performance applications. The problem is compounded in multicore systems or when using advanced cores like the Cortex-M85, which push the boundaries of performance even further.

Core Speed, TCM Usage, and DMA Impact on ETM Performance

The primary cause of ETM FIFOFULL events in high-speed Cortex-M7 and M4 systems is the mismatch between the core’s event generation rate and the ETM’s event consumption rate. The Cortex-M7, with its high clock speeds and advanced pipeline architecture, can generate trace data at a rate that far exceeds the ETM’s ability to process and store it. This is especially true when the core is executing code from ITCM and accessing data from DTCM, as these memories provide extremely low-latency access, further increasing the rate at which events are generated.

The use of DMA also plays a significant role in exacerbating the FIFOFULL issue. DMA transfers can generate a large number of memory access events, which the ETM must capture and store. When combined with high-speed core operation and TCM usage, the ETM’s buffer can quickly fill up, leading to FIFOFULL events. Additionally, the 4-wire bus architecture used in these systems can further limit the ETM’s ability to keep up with the core’s event generation rate, as the bus bandwidth may not be sufficient to handle the volume of trace data being produced.

Another contributing factor is the nature of the application being traced. Systems running an RTOS like FreeRTOS generate a significant number of trace packets due to context switches, idle cycles, and other OS-related events. When combined with high-speed core operation and TCM usage, the volume of trace data can quickly overwhelm the ETM. Similarly, tracing specific data structures, such as a 16-byte structure updated every 125 microseconds, can generate a high volume of trace events, further contributing to the FIFOFULL issue.

Optimizing ETM Performance in High-Speed Cortex-M Systems

To address the ETM FIFOFULL issue in high-speed Cortex-M7 and M4 systems, several strategies can be employed. These strategies focus on reducing the volume of trace data generated, optimizing the use of TCM and DMA, and improving the ETM’s ability to handle high-speed event generation.

Reducing Trace Data Volume: One of the most effective ways to mitigate the FIFOFULL issue is to reduce the volume of trace data generated by the core. This can be achieved by selectively enabling trace for specific parts of the application, rather than tracing the entire system. For example, instead of tracing all memory accesses, you can limit tracing to specific data structures or functions that are of interest. Additionally, you can reduce the granularity of the trace data by capturing only certain types of events, such as branch events or data writes, rather than capturing every instruction executed.

Optimizing TCM Usage: While TCM provides low-latency access to instructions and data, its use can significantly increase the volume of trace data generated. To mitigate this, consider moving less critical code and data to external memory, which has higher latency but generates fewer trace events. This can help reduce the load on the ETM while still maintaining high performance for critical parts of the application. Additionally, you can use cache locking techniques to keep frequently accessed data in the cache, reducing the number of memory accesses that need to be traced.

Managing DMA Transfers: DMA transfers can generate a large number of trace events, especially when used in conjunction with high-speed core operation and TCM. To reduce the impact of DMA on the ETM, consider using burst mode transfers, which generate fewer trace events than single transfers. Additionally, you can use double buffering techniques to overlap DMA transfers with computation, reducing the number of trace events generated during the transfer process. Finally, consider using a dedicated DMA controller with its own trace buffer, which can offload some of the trace data from the main ETM.

Improving ETM Bandwidth: The 4-wire bus architecture used in many Cortex-M systems can limit the ETM’s ability to handle high-speed event generation. To improve ETM bandwidth, consider using a wider bus architecture, such as an 8-wire or 16-wire bus, which can provide more bandwidth for trace data. Additionally, you can use a dedicated trace port interface (TPIU) to offload trace data from the main bus, reducing the load on the ETM. Finally, consider using a higher-speed trace buffer, which can store more trace data and reduce the likelihood of FIFOFULL events.

RTOS-Specific Optimizations: When using an RTOS like FreeRTOS, the volume of trace data generated can be significant due to context switches, idle cycles, and other OS-related events. To reduce the impact of these events on the ETM, consider using RTOS-aware trace tools that can filter out unnecessary trace data. For example, you can configure the trace tool to ignore idle cycles or to only trace specific tasks or interrupts. Additionally, you can use RTOS-specific optimizations, such as reducing the number of context switches or using a tickless idle mode, to further reduce the volume of trace data generated.

Hardware-Based Solutions: In some cases, hardware-based solutions may be necessary to address the ETM FIFOFULL issue. For example, you can use a larger trace buffer or a faster trace memory to increase the ETM’s capacity to store trace data. Additionally, you can use a dedicated trace processor to offload trace data processing from the main ETM, reducing the load on the core and improving overall system performance. Finally, consider using a more advanced trace architecture, such as the CoreSight system, which provides more flexibility and scalability for high-speed tracing.

Conclusion: The ETM FIFOFULL issue in high-speed Cortex-M7 and M4 systems is a complex problem that requires a multi-faceted approach to address. By reducing the volume of trace data generated, optimizing the use of TCM and DMA, improving ETM bandwidth, and employing RTOS-specific optimizations, you can mitigate the impact of FIFOFULL events and improve the overall performance of your system. Additionally, hardware-based solutions, such as larger trace buffers or more advanced trace architectures, can provide further improvements in high-speed tracing scenarios. With these strategies in place, you can achieve effective real-time debugging and performance analysis in even the most demanding Cortex-M systems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *