PMU Event Counters Not Incrementing Despite Proper Configuration
The Performance Monitoring Unit (PMU) in ARM Cortex-A72 processors is a powerful tool for profiling and analyzing system performance. However, a common issue arises when the event counters (PMXEVCNTR_EL0) remain at zero despite seemingly correct configuration and initialization. This problem is particularly perplexing because the cycle counter (PMCCNTR_EL0) functions as expected, indicating that the PMU is partially operational. The issue is often encountered when developers attempt to profile specific events such as cache misses, branch mispredictions, or instruction executions, but the event counters fail to increment.
The root cause of this issue is often multifaceted, involving subtle misconfigurations, hardware-software interactions, or timing issues. In the case of the Cortex-A72, the PMU event counters are highly sensitive to the sequence of register writes, the state of the PMU control registers, and the specific events being monitored. Additionally, the interaction between the PMU and the operating system or real-time operating system (RTOS) can introduce complexities that are not immediately apparent.
To diagnose and resolve this issue, it is essential to understand the PMU architecture, the role of each register, and the sequence of operations required to enable and read event counters. The following sections will delve into the possible causes and provide a detailed troubleshooting guide to address the problem.
Misconfigured PMUSERENR_EL0 and PMCNTENSET_EL0 Registers
One of the primary reasons for the PMXEVCNTR_EL0 registers remaining at zero is the misconfiguration of the PMUSERENR_EL0 and PMCNTENSET_EL0 registers. The PMUSERENR_EL0 register controls user-level access to the PMU, while the PMCNTENSET_EL0 register enables specific event counters and the cycle counter. A common oversight is failing to set the appropriate bits in these registers, which prevents the event counters from being enabled or read.
The PMUSERENR_EL0 register has three critical bits: EN (bit 0), CR (bit 2), and ER (bit 3). The EN bit enables user-level access to the PMU, the CR bit allows user-level access to the cycle counter, and the ER bit enables user-level access to the event counters. If the ER bit is not set, the event counters cannot be read, even if they are enabled in the PMCNTENSET_EL0 register. In the provided code, the PMUSERENR_EL0 register is configured with only the EN and CR bits set, which explains why the cycle counter works but the event counters do not.
The PMCNTENSET_EL0 register is used to enable specific event counters and the cycle counter. Each bit in this register corresponds to a specific counter, with bit 31 enabling the cycle counter and bits 0-30 enabling event counters. If the appropriate bits are not set in this register, the corresponding counters will not increment. In the provided code, the PMCNTENSET_EL0 register is configured with bits 0-2 and bit 31 set, which should enable the first three event counters and the cycle counter. However, if the event counters are still not incrementing, it suggests that there may be an issue with the event selection or the sequence of operations.
Another potential issue is the timing of the register writes. The ARM architecture requires specific synchronization instructions, such as the ISB (Instruction Synchronization Barrier), to ensure that register writes are completed before proceeding to the next operation. If these synchronization instructions are omitted or placed incorrectly, the PMU may not be properly configured, leading to non-functional event counters.
Incorrect Event Selection and PMSELR_EL0 Configuration
The PMSELR_EL0 register is used to select the event counter that will be configured or read. Each event counter can be programmed to monitor a specific event, such as cache misses or branch mispredictions, by writing the corresponding event number to the PMXEVTYPER_EL0 register. However, if the wrong event number is selected or the PMSELR_EL0 register is not properly configured, the event counters will not increment.
In the provided code, the PMSELR_EL0 register is set to 0, 1, and 2, and the PMXEVTYPER_EL0 register is configured with event numbers 0x2, 0x4, and 0x6. These event numbers correspond to specific events, such as instruction executions or cache accesses, depending on the Cortex-A72 implementation. However, if these event numbers are not supported by the processor or do not match the intended events, the counters will not increment.
Additionally, the sequence of operations is critical when configuring the PMU. The PMSELR_EL0 register must be set before writing to the PMXEVTYPER_EL0 register, and the ISB instruction must be used to ensure that the write is completed before proceeding. If the sequence is incorrect or the ISB instruction is omitted, the PMU may not be properly configured, leading to non-functional event counters.
Another potential issue is the interaction between the PMU and the operating system or RTOS. In some cases, the operating system may disable or override the PMU configuration, preventing the event counters from incrementing. This is particularly common in multi-core systems, where the PMU configuration may be shared between cores or managed by the operating system. In such cases, it is essential to ensure that the PMU configuration is consistent across all cores and that the operating system is not interfering with the PMU.
Implementing Correct PMU Configuration and Synchronization
To resolve the issue of non-incrementing PMU event counters, it is essential to implement the correct configuration and synchronization sequence. The following steps outline the necessary actions to ensure that the PMU is properly configured and that the event counters increment as expected.
First, ensure that the PMUSERENR_EL0 register is configured with the EN, CR, and ER bits set. This will enable user-level access to the PMU, the cycle counter, and the event counters. The following code snippet demonstrates the correct configuration:
__asm__ volatile("msr pmuserenr_el0, %0" :: "r" ((1 << 0) | (1 << 2) | (1 << 3)));
Next, configure the PMCNTENSET_EL0 register to enable the desired event counters and the cycle counter. Ensure that the appropriate bits are set for the event counters you wish to monitor. The following code snippet demonstrates the correct configuration:
__asm__ volatile("msr PMCNTENSET_EL0, %0" : : "r" ((7 | (1 << 31))));
After enabling the counters, configure the PMSELR_EL0 and PMXEVTYPER_EL0 registers to select and configure the event counters. Ensure that the PMSELR_EL0 register is set before writing to the PMXEVTYPER_EL0 register, and use the ISB instruction to synchronize the writes. The following code snippet demonstrates the correct sequence:
__asm__ volatile("msr PMSELR_EL0, %0" : : "r" (0));
__asm__ volatile("isb" : :);
__asm__ volatile("msr PMXEVTYPER_EL0, %0" : : "r" (0x2));
__asm__ volatile("isb" : :);
Repeat this sequence for each event counter you wish to configure. Finally, enable the PMU by setting the appropriate bits in the PMCR_EL0 register. Ensure that the PMU is enabled, the cycle counter is reset, and the event counters are reset. The following code snippet demonstrates the correct configuration:
int val = 0;
__asm__ volatile("mrs %0, PMCR_EL0" : "=r" (val));
val |= 1 << 0; // Enable PMU
val |= 1 << 1; // Reset cycle counter
val |= 1 << 2; // Reset event counters
__asm__ volatile("msr PMCR_EL0, %0" : : "r" (val));
__asm__ volatile("isb" : :);
After configuring the PMU, run the test code and stop the PMU to read the counters. Ensure that the PMSELR_EL0 register is set before reading the PMXEVCNTR_EL0 register, and use the ISB instruction to synchronize the reads. The following code snippet demonstrates the correct sequence:
__asm__ volatile("msr PMSELR_EL0, %0" : : "r" (0));
__asm__ volatile("isb" : :);
__asm__ volatile("mrs %0, PMXEVCNTR_EL0" : "=r" (val));
printf("Perf: PMXEVCNTR_EL0 %d \n", val);
By following these steps, you should be able to resolve the issue of non-incrementing PMU event counters and successfully profile your system using the ARM Cortex-A72 PMU. If the issue persists, consider checking the event numbers and ensuring that they are supported by the processor. Additionally, verify that the operating system or RTOS is not interfering with the PMU configuration.