Issue Overview: Second Core Receives Only One PL Interrupt in AMP Configuration
In a Zynq 7020 Cortex-A9 Asymmetric Multiprocessing (AMP) setup, where each core runs a separate FreeRTOS instance, a critical issue arises with the second core’s ability to handle interrupts from Programmable Logic (PL)-based IPs. The second core initializes correctly, waits for the first core to complete Ethernet initialization, and then resumes execution. Both cores operate independently and can send Ethernet broadcast messages from different IP addresses when tested separately without the USE_AMP directive. However, in the AMP configuration, the second core only receives a single interrupt from PL-based IPs and fails to receive any subsequent interrupts. PS (Processing System) interrupts, such as EMAC, Global Timer, and TTC Timer, function normally.
The problem manifests specifically in the context of PL interrupts, suggesting a potential issue with interrupt routing, L2 cache management, or synchronization between the cores. Debugging reveals that modifying, skipping, or executing certain L2 cache-related instructions during the second core’s boot sequence causes the first core to crash, while the second core continues running. This behavior points to a possible misconfiguration in the L2 cache or interrupt handling mechanisms when USE_AMP is enabled. The challenge lies in identifying whether the root cause is related to L2 cache mismanagement, incorrect interrupt routing, or a missing synchronization step between the cores.
Possible Causes: L2 Cache Mismanagement, Interrupt Routing, or Synchronization Issues
The issue of the second core receiving only one PL interrupt in the Zynq 7020 Cortex-A9 AMP setup can be attributed to several potential causes, each requiring careful investigation. The primary suspects include L2 cache mismanagement, incorrect interrupt routing, and synchronization issues between the cores.
L2 Cache Mismanagement: The L2 cache in the Zynq 7020 is shared between the two Cortex-A9 cores. Improper configuration or management of the L2 cache can lead to inconsistent behavior, especially in an AMP setup where each core operates independently. The observation that modifying L2 cache-related instructions during the second core’s boot sequence causes the first core to crash suggests that the L2 cache configuration might be incorrect or improperly synchronized. For instance, if the L2 cache is not invalidated or cleaned correctly, it could lead to stale data or incorrect interrupt handling, causing the second core to miss PL interrupts after the first one.
Incorrect Interrupt Routing: The Zynq 7020 uses the Generic Interrupt Controller (GIC) to manage interrupts for both cores. In an AMP setup, interrupts must be routed correctly to the appropriate core. The fact that PS interrupts work normally while PL interrupts fail after the first instance suggests a potential issue with interrupt routing. The SetCPUID function in the ScuGic (System Control Unit Generic Interrupt Controller) is used to assign interrupts to specific cores. If the PL interrupts are not correctly assigned to the second core or if there is a misconfiguration in the GIC, the second core might fail to receive subsequent PL interrupts.
Synchronization Issues: In an AMP setup, proper synchronization between the cores is critical to ensure that shared resources, such as the L2 cache and interrupt controllers, are accessed correctly. If the cores are not synchronized properly, it could lead to race conditions or inconsistent states, causing the second core to miss PL interrupts. For example, if the first core modifies a shared resource without proper synchronization, it could interfere with the second core’s ability to handle interrupts. The use of semaphores, barriers, or other synchronization mechanisms might be necessary to ensure that both cores operate harmoniously.
Troubleshooting Steps, Solutions & Fixes: Diagnosing and Resolving the PL Interrupt Issue
To address the issue of the second core receiving only one PL interrupt in the Zynq 7020 Cortex-A9 AMP setup, a systematic approach is required to diagnose and resolve the problem. The following steps outline a comprehensive troubleshooting process, including potential solutions and fixes.
Step 1: Verify L2 Cache Configuration and Management
The first step is to ensure that the L2 cache is configured and managed correctly for the AMP setup. This involves checking the L2 cache settings in the boot code and runtime configuration for both cores. Specifically, verify that the L2 cache is enabled and configured with the correct attributes, such as cache size, associativity, and latency. Additionally, ensure that the L2 cache is properly invalidated and cleaned during the boot sequence to avoid stale data or incorrect interrupt handling.
To diagnose L2 cache issues, use debugging tools to monitor the L2 cache state during the boot sequence and runtime. Check for any inconsistencies or errors in the L2 cache configuration that might affect interrupt handling. If necessary, modify the L2 cache-related instructions in the boot.S file to ensure that the L2 cache is properly initialized and synchronized between the cores.
Step 2: Validate Interrupt Routing in the GIC
The next step is to validate the interrupt routing in the GIC to ensure that PL interrupts are correctly assigned to the second core. This involves checking the SetCPUID function in the ScuGic to confirm that PL interrupts are routed to the second core. Additionally, verify that the GIC is configured correctly for the AMP setup, with the appropriate priority and target settings for each interrupt.
To diagnose interrupt routing issues, use debugging tools to monitor the GIC state and interrupt handling for both cores. Check for any misconfigurations or errors in the GIC that might cause the second core to miss PL interrupts. If necessary, modify the GIC configuration to ensure that PL interrupts are correctly assigned to the second core and that the GIC is properly synchronized between the cores.
Step 3: Implement Proper Synchronization Mechanisms
The final step is to implement proper synchronization mechanisms between the cores to ensure that shared resources, such as the L2 cache and interrupt controllers, are accessed correctly. This involves using semaphores, barriers, or other synchronization mechanisms to prevent race conditions or inconsistent states that might affect interrupt handling.
To diagnose synchronization issues, use debugging tools to monitor the synchronization state between the cores during the boot sequence and runtime. Check for any race conditions or inconsistent states that might cause the second core to miss PL interrupts. If necessary, modify the synchronization mechanisms to ensure that both cores operate harmoniously and that shared resources are accessed correctly.
Potential Solutions and Fixes
Based on the troubleshooting steps outlined above, the following potential solutions and fixes can be implemented to resolve the issue of the second core receiving only one PL interrupt in the Zynq 7020 Cortex-A9 AMP setup:
-
Correct L2 Cache Configuration: Ensure that the L2 cache is configured and managed correctly for the AMP setup, with the appropriate attributes and synchronization mechanisms. Modify the L2 cache-related instructions in the boot.S file to ensure proper initialization and synchronization between the cores.
-
Proper Interrupt Routing: Validate the interrupt routing in the GIC to ensure that PL interrupts are correctly assigned to the second core. Modify the GIC configuration to ensure that PL interrupts are routed to the second core and that the GIC is properly synchronized between the cores.
-
Synchronization Mechanisms: Implement proper synchronization mechanisms between the cores to ensure that shared resources are accessed correctly. Use semaphores, barriers, or other synchronization mechanisms to prevent race conditions or inconsistent states that might affect interrupt handling.
By following these troubleshooting steps and implementing the potential solutions and fixes, the issue of the second core receiving only one PL interrupt in the Zynq 7020 Cortex-A9 AMP setup can be effectively diagnosed and resolved.