Cortex-A57 L2 Subsystem Reset Challenges in Multi-Cluster Systems
The Cortex-A57 processor, part of ARM’s Cortex-A series, is widely used in high-performance embedded systems, particularly in multi-core and multi-cluster configurations. One of the critical components of the Cortex-A57 architecture is the L2 cache subsystem, which plays a vital role in ensuring efficient data access and system performance. However, resetting the L2 subsystem in a multi-cluster environment, especially when running SPL U-Boot from static memory, presents unique challenges. This issue is compounded by the dynamic power management mechanisms inherent in the system, such as the ARM Power State Coordination Interface (PSCI), which manages the power states of CPUs and the L2 cache.
In a typical multi-cluster system, each cluster consists of multiple Cortex-A57 CPUs connected through an interconnect like the CCN504. The L2 cache is shared among the CPUs within a cluster, and its state is tightly coupled with the power state of the CPUs. When the last CPU in a cluster is powered down, the L2 cache is also powered down, and when the first CPU in the cluster is powered up, the L2 cache is brought back online. However, in scenarios where a single CPU (e.g., CPU0) is running SPL U-Boot from static memory, resetting the L2 subsystem without affecting the entire cluster becomes a non-trivial task.
The primary challenge lies in the fact that the L2 subsystem is not designed to be reset independently of the cluster. The L2 cache is deeply integrated with the CPU cores, and its reset mechanism is typically tied to the cluster’s reset signal. Attempting to reset the L2 subsystem while keeping CPU0 operational can lead to system instability, as observed in the case where issuing a reset to the cluster containing CPU0 resulted in a board hang. This behavior suggests that the L2 subsystem reset mechanism is not fully decoupled from the CPU reset logic, leading to potential conflicts when attempting to reset the L2 cache independently.
Hold L2 Register and nL2RESET Signal Misalignment
One of the key findings in addressing the L2 subsystem reset issue is the role of the "hold L2" register and the nL2RESET signal. The "hold L2" register is a control register that, when set, is supposed to assert the nL2RESET signal, which in turn should reset the L2 subsystem. However, the interaction between the "hold L2" register and the nL2RESET signal is not straightforward, especially in a multi-cluster environment where the L2 cache is shared among multiple CPUs.
In the case where CPU0 is running SPL U-Boot from static memory, attempting to reset the L2 subsystem by setting the "hold L2" register and asserting the nL2RESET signal can lead to unexpected behavior. Specifically, when the reset is issued to the cluster containing CPU0, the system hangs, indicating that the L2 subsystem reset mechanism is not functioning as intended. This suggests that the "hold L2" register and the nL2RESET signal are not properly aligned with the power management logic of the cluster, leading to a conflict between the L2 reset and the CPU’s operational state.
The misalignment between the "hold L2" register and the nL2RESET signal can be attributed to several factors. First, the timing of the L2 reset signal may not be synchronized with the CPU’s power state transitions, leading to a situation where the L2 cache is reset while the CPU is still accessing it. Second, the "hold L2" register may not fully isolate the L2 subsystem from the CPU, allowing residual transactions to interfere with the reset process. Finally, the interconnect (e.g., CCN504) may not properly handle the L2 reset signal, leading to incomplete or inconsistent reset behavior across different clusters.
Implementing L2 Subsystem Reset with CPU0 Isolation
To address the challenges of resetting the L2 subsystem in a multi-cluster Cortex-A57 system while keeping CPU0 operational, a systematic approach is required. This approach involves isolating CPU0 from the L2 reset process, ensuring proper synchronization between the L2 reset signal and the CPU’s power state, and verifying the behavior of the interconnect during the reset process.
The first step in implementing the L2 subsystem reset is to isolate CPU0 from the reset process. This can be achieved by ensuring that CPU0 is not actively accessing the L2 cache when the reset is initiated. One way to achieve this is by placing CPU0 in a low-power state or by temporarily disabling its access to the L2 cache. This can be done by modifying the CPU’s memory attributes to mark the L2 cache as non-cacheable or by using memory barriers to ensure that all pending transactions are completed before the reset is initiated.
Once CPU0 is isolated, the next step is to assert the nL2RESET signal using the "hold L2" register. However, before doing so, it is crucial to ensure that the L2 reset signal is properly synchronized with the CPU’s power state. This can be achieved by adding a delay between the assertion of the nL2RESET signal and the actual reset of the L2 subsystem. The delay should be long enough to allow any pending transactions to complete and to ensure that the CPU is in a stable state before the reset is initiated.
After the L2 subsystem reset is complete, the final step is to verify the behavior of the interconnect and ensure that the L2 cache is properly reinitialized. This can be done by performing a series of memory access tests to verify that the L2 cache is functioning correctly and that there are no residual issues from the reset process. Additionally, it is important to monitor the system for any signs of instability or unexpected behavior, as these could indicate that the L2 reset process was not fully successful.
In conclusion, resetting the L2 subsystem in a multi-cluster Cortex-A57 system while keeping CPU0 operational is a complex task that requires careful consideration of the system’s power management logic, the behavior of the interconnect, and the timing of the L2 reset signal. By isolating CPU0 from the reset process, ensuring proper synchronization between the L2 reset signal and the CPU’s power state, and verifying the behavior of the interconnect, it is possible to achieve a successful L2 subsystem reset without causing system instability. However, this process requires a deep understanding of the Cortex-A57 architecture and the specific implementation details of the system in question.