ARM Cortex-M7 Usage Fault Due to Illegal Unaligned Memory Access

The ARM Cortex-M7 processor is a high-performance embedded processor designed for real-time applications. One of the challenges developers face when working with this processor is diagnosing and resolving usage faults, particularly those related to illegal unaligned memory accesses. An unaligned memory access occurs when a memory operation (load or store) is performed on an address that is not aligned to the natural boundary of the data type being accessed. For example, accessing a 32-bit word at an address that is not a multiple of 4 bytes can trigger an unaligned access fault.

In this guide, we will delve into the specifics of diagnosing and resolving an illegal unaligned load/store usage fault on the ARM Cortex-M7 processor. The fault manifests as a Usage Fault with the CFSR (Configurable Fault Status Register) indicating an UNALIGNED access error. The fault is non-deterministic, occurring approximately every 500 hours of runtime, making it particularly challenging to debug. We will explore the root causes, diagnostic techniques, and potential solutions to resolve this issue.

Memory Barrier Omission and Cache Invalidation Timing

The ARM Cortex-M7 processor, like other ARM Cortex-M processors, has strict alignment requirements for certain memory operations. While the processor can handle unaligned accesses in some cases, certain multi-word instructions or specific memory regions may enforce strict alignment rules. When an unaligned access occurs in a context where alignment is enforced, a Usage Fault is triggered.

One of the primary causes of unaligned access faults is the omission of memory barriers or improper cache invalidation timing. The Cortex-M7 features a highly advanced memory system with caches and write buffers, which can lead to subtle timing issues if not managed correctly. For instance, if a memory operation is performed on a recently invalidated cache line without proper synchronization, it may result in an unaligned access fault.

Another potential cause is stack overflow. When the stack overflows, it can corrupt adjacent memory regions, leading to unpredictable behavior, including unaligned memory accesses. The Cortex-M7’s stack pointer (SP) must always point to an aligned address. If the stack grows beyond its allocated space, it may cause the SP to point to an unaligned address, triggering a fault when a subsequent memory access is attempted.

Additionally, the use of Real-Time Operating Systems (RTOS) like Keil RTX can introduce complexities. RTOS tasks and threads may have their own stack spaces, and improper stack management or task switching can lead to unaligned accesses. For example, if a task switch occurs while a memory operation is in progress, the new task’s stack may not be properly aligned, leading to a fault.

Implementing Data Synchronization Barriers and Cache Management

To diagnose and resolve the illegal unaligned load/store usage fault, a systematic approach is required. The following steps outline the process:

Step 1: Enable Stack Overflow Checking and Watermarking

The first step is to enable stack overflow checking and watermarking in the RTOS configuration. In Keil RTX, this can be done by modifying the RTX_Config.h file. Enabling stack overflow checking will cause the RTOS to monitor stack usage and trigger an error if a stack overflow is detected. Watermarking allows developers to observe the maximum stack usage for each task, providing insights into potential stack overflows.

To enable stack overflow checking, set the OS_STACK_CHECK macro to 1 in RTX_Config.h. Additionally, enable stack watermarking by setting the OS_STACK_WATERMARK macro to 1. These settings will increase the context switching overhead but are essential for diagnosing stack-related issues.

Step 2: Analyze Stack Usage with RTX RTOS Viewer

Once stack overflow checking and watermarking are enabled, use the RTX RTOS Viewer in Keil MDK to analyze stack usage. The RTX RTOS Viewer provides a graphical representation of stack usage for each task, allowing developers to identify tasks that are consuming excessive stack space.

To access the RTX RTOS Viewer, open the "Watch" window in Keil MDK and select "RTX RTOS." Run the application for a period of time, then halt the debugger. In the RTX RTOS Viewer, expand the thread information to view the stack usage details. Pay particular attention to tasks that are close to their stack limits, as these are potential candidates for stack overflow.

Step 3: Investigate Main Stack Overflow

In addition to RTOS task stacks, the main stack used by the application must also be examined for potential overflow. The main stack is typically defined in the device startup file (e.g., startup_stm32f746xx.s). Compare the allocated stack size with the maximum stack usage reported in the linker-generated HTML file (located in the Objects directory).

If the main stack is close to its limit, consider increasing the stack size or optimizing the application to reduce stack usage. To detect main stack overflow during runtime, add a data variable at the end of the main stack declaration and set a data watchpoint on this variable. If the watchpoint is triggered, it indicates that the main stack has overflowed.

Step 4: Enable Event Tracing

Event tracing can provide valuable insights into the sequence of events leading up to the fault. Enable event tracing in Keil MDK to capture exception events, interrupts, and task switches. The event trace can help identify the combination of events that precede the fault, providing clues about the root cause.

To enable event tracing, configure the Event Recorder in Keil MDK and enable tracing for exceptions, interrupts, and RTOS events. Run the application and capture the event trace when the fault occurs. Analyze the trace to identify any unusual patterns or sequences of events that may be contributing to the fault.

Step 5: Implement Data Synchronization Barriers

To address potential cache and memory synchronization issues, implement data synchronization barriers (DSB) and instruction synchronization barriers (ISB) at critical points in the code. These barriers ensure that memory operations are completed before proceeding to the next instruction, preventing unaligned accesses due to cache or write buffer inconsistencies.

For example, when invalidating a cache line, use the DSB instruction to ensure that the invalidation is complete before performing subsequent memory operations. Similarly, use the ISB instruction to ensure that the processor pipeline is flushed before executing instructions that depend on the results of previous memory operations.

Step 6: Use J-Trace for Instruction Tracing

For non-deterministic faults that occur infrequently, instruction tracing can be invaluable. Use J-Trace to capture the instruction trace in real-time, allowing you to analyze the sequence of instructions leading up to the fault. While the fault may be imprecise (i.e., the offending instruction may be several instructions upstream), the trace can provide valuable context for diagnosing the issue.

To use J-Trace, configure the trace settings in Keil MDK and start the trace capture. Run the application until the fault occurs, then halt the debugger and analyze the trace. Look for any unusual instruction sequences or memory accesses that may be contributing to the fault.

Step 7: Review and Optimize Code for Alignment

Finally, review the application code to ensure that all memory accesses are properly aligned. Pay particular attention to structures, arrays, and pointers that may be accessed in a non-aligned manner. Use compiler directives or attributes to enforce alignment where necessary.

For example, in C/C++, use the __attribute__((aligned)) attribute to ensure that structures or variables are aligned to the required boundary. In assembly code, use the .align directive to enforce alignment. Additionally, avoid using unaligned pointers or performing unaligned memory accesses unless absolutely necessary.

Conclusion

Diagnosing and resolving an illegal unaligned load/store usage fault on the ARM Cortex-M7 processor requires a systematic approach. By enabling stack overflow checking, analyzing stack usage, implementing data synchronization barriers, and using instruction tracing, developers can identify and address the root cause of the fault. Additionally, reviewing and optimizing code for alignment can prevent future occurrences of this issue.

The ARM Cortex-M7’s advanced memory system and real-time capabilities make it a powerful platform for embedded applications, but they also introduce complexities that require careful management. By following the steps outlined in this guide, developers can ensure reliable and efficient operation of their Cortex-M7-based systems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *