ARM Cortex-A Hypervisor Timer Interrupt Configuration and Debugging
When working with ARM Cortex-A processors in a hypervisor environment, configuring and debugging timer interrupts can be a complex task, especially when dealing with Exception Levels (ELs) and the Generic Interrupt Controller (GIC). The issue at hand involves a hypervisor running in EL2 (AArch64) attempting to use the physical EL2 timer (CNTHP_*_EL2) to signal an interrupt. The timer is configured correctly, and the interrupt status (ISTATUS) bit in CNTHP_CTL_EL2 is set as expected. However, the interrupt is not being signaled to the Processing Element (PE), despite the interrupt being pending in the GIC and the relevant registers indicating that the interrupt is enabled.
Understanding the ARM Cortex-A Timer and GIC Interaction
The ARM Cortex-A architecture provides a sophisticated mechanism for handling interrupts, which involves both the processor’s exception handling and the GIC. The GIC is responsible for managing interrupts from various sources and forwarding them to the appropriate PE. In this scenario, the hypervisor is attempting to use the physical EL2 timer to generate an interrupt. The timer is configured to trigger an interrupt with INTID 26, which is set as a Group 1 Non-Secure (NS) interrupt. The interrupt is enabled, and its priority is set to 0xa0. The ISTATUS bit in CNTHP_CTL_EL2 is set to 1, indicating that the timer condition has been met, and the interrupt is pending.
The GIC registers, such as ICC_HPPIR1_EL1 and GICR_ICPENDR0, confirm that the interrupt is pending and has been forwarded to the CPU interface. However, the interrupt is not being signaled to the PE, which suggests that there is a misconfiguration or masking of the interrupt at either the GIC or PE level.
Investigating GIC and PE-Level Interrupt Masking
To diagnose why the interrupt is not being signaled to the PE, it is essential to examine both the GIC and the PE-level interrupt masking mechanisms. The GIC is responsible for asserting the interrupt signal to the PE, but this signal can be masked at various levels, preventing the PE from responding to the interrupt.
The first step in diagnosing this issue is to read the ISR_EL1 (Interrupt Status Register). This register provides information about the state of the interrupt inputs to the PE, effectively reflecting the output of the GIC CPU interface. If the ISR_EL1.{I,F} bits are both 0, it indicates that the CPU interface is not asserting an interrupt to the PE. This could be due to the Priority Mask (ICC_PMR_EL1) or the Running Priority (ICC_RPR_EL1), which determine the minimum priority required for an interrupt to be signaled to the PE.
The Priority Mask (ICC_PMR_EL1) is set by software and can mask interrupts below a certain priority level. The Running Priority (ICC_RPR_EL1) is dynamic and changes as interrupts are acknowledged and End of Interrupt (EOI) is signaled. If the Priority Mask is set too high, or if the Running Priority is higher than the priority of the pending interrupt, the interrupt will not be signaled to the PE.
Additionally, the PE itself can mask interrupts using the PSTATE.I bit. If PSTATE.I is set, all interrupts are masked at the PE level. However, in this case, PSTATE.I is not masking the interrupt, as confirmed by the user. Therefore, the issue must lie elsewhere.
Exploring Exception Level Routing and Interrupt Masking
Another critical aspect to consider is the routing of interrupts between different Exception Levels (ELs). The ARM architecture allows interrupts to be routed to specific ELs based on the configuration of the Hypervisor Configuration Register (HCR_EL2) and the Secure Configuration Register (SCR_EL3). These registers control whether interrupts are routed to EL1, EL2, or EL3, and they can implicitly mask interrupts if they are not routed to the current EL.
For example, if HCR_EL2 and SCR_EL3 are configured to route IRQs to EL1, then any IRQs will be implicitly masked at EL2 and EL3. This means that even if the interrupt is pending and enabled in the GIC, it will not be signaled to the PE if the current EL is higher than the EL to which the interrupt is routed.
In this scenario, the user discovered that the issue was related to the configuration of HCR_EL2. Initially, it was assumed that HCR_EL2 was only relevant for interrupts from lower ELs. However, it was found that HCR_EL2 is also relevant for interrupts at any EL. By correctly configuring HCR_EL2, the interrupt handler was triggered as soon as the timer condition was met, resolving the issue.
Detailed Troubleshooting Steps and Solutions
To systematically troubleshoot and resolve issues related to ARM Cortex-A hypervisor timer interrupts not being signaled to the PE, follow these detailed steps:
Step 1: Verify Timer Configuration and Interrupt Status
Ensure that the physical EL2 timer (CNTHP_*_EL2) is correctly configured. This includes setting the timer value, enabling the timer, and verifying that the ISTATUS bit in CNTHP_CTL_EL2 is set to 1 when the timer condition is met. Additionally, confirm that the IMASK bit is 0, indicating that the interrupt is not masked at the timer level.
Step 2: Check GIC Configuration and Interrupt Routing
Verify that the interrupt is correctly configured in the GIC. This includes setting the INTID, enabling the interrupt in the GIC Distributor (GICD), and ensuring that the interrupt is assigned to the correct group (Group 1 NS in this case). Check the GIC registers, such as ICC_HPPIR1_EL1 and GICR_ICPENDR0, to confirm that the interrupt is pending and has been forwarded to the CPU interface.
Step 3: Examine ISR_EL1 and Interrupt Masking
Read the ISR_EL1 register to determine whether the interrupt is being signaled by the GIC to the PE. If ISR_EL1.{I,F} are both 0, it indicates that the CPU interface is not asserting an interrupt to the PE. This could be due to the Priority Mask (ICC_PMR_EL1) or the Running Priority (ICC_RPR_EL1). Ensure that the Priority Mask is set to a value that allows the pending interrupt to be signaled, and check the Running Priority to ensure it is not higher than the priority of the pending interrupt.
Step 4: Verify PE-Level Interrupt Masking
Check the PSTATE.I bit to ensure that interrupts are not being masked at the PE level. If PSTATE.I is set, clear it to allow interrupts to be signaled to the PE. Additionally, verify that the current Exception Level (EL) is not implicitly masking the interrupt due to the configuration of HCR_EL2 and SCR_EL3. Ensure that the interrupt is routed to the current EL or a higher EL to prevent implicit masking.
Step 5: Review HCR_EL2 and SCR_EL3 Configuration
Examine the configuration of HCR_EL2 and SCR_EL3 to ensure that interrupts are correctly routed to the desired EL. If the interrupt is intended for EL2, ensure that HCR_EL2 is configured to route IRQs to EL2. Similarly, if the interrupt is intended for EL3, ensure that SCR_EL3 is configured to route IRQs to EL3. Adjust the routing configuration as necessary to ensure that the interrupt is not implicitly masked at the current EL.
Step 6: Test and Validate the Interrupt Handler
Once the configuration has been verified and adjusted as necessary, test the interrupt handler to ensure that it is triggered as expected when the timer condition is met. Use debugging tools to monitor the interrupt handling process and confirm that the interrupt is being signaled to the PE and handled correctly.
Conclusion
Debugging ARM Cortex-A hypervisor timer interrupts requires a thorough understanding of both the GIC and the PE-level interrupt handling mechanisms. By systematically verifying the timer configuration, GIC settings, and PE-level interrupt masking, it is possible to identify and resolve issues related to interrupts not being signaled to the PE. In this case, the issue was resolved by correctly configuring HCR_EL2 to ensure that the interrupt was routed to the appropriate Exception Level. Following the detailed troubleshooting steps outlined above will help ensure that hypervisor timer interrupts are correctly configured and handled in ARM Cortex-A systems.