ARM Cortex-A15 HYP Mode vs. SVC Mode Performance Discrepancy
The core issue revolves around a significant performance discrepancy observed when executing a simple delay loop in Hypervisor (HYP) mode compared to Supervisor (SVC) mode on an ARMv7-A architecture, specifically the Exynos5422 SoC with big.LITTLE configuration (Cortex-A7 and Cortex-A15). The delay loop, implemented as a simple for
loop with a nop
instruction, runs approximately 40 times slower in HYP mode than in SVC mode. This discrepancy persists even when the compiler settings and the loop implementation remain identical across both modes. The only difference lies in the boot code, which determines whether the system starts in HYP mode or transitions to SVC mode.
The performance degradation in HYP mode is not merely a theoretical concern but has practical implications for systems that rely on hypervisor-based virtualization. The hypervisor is responsible for managing multiple operating systems or software environments on a single hardware platform, and any performance degradation in HYP mode can directly impact the overall system efficiency. The issue is further complicated by the fact that the Exynos5422 SoC employs a big.LITTLE architecture, where the Cortex-A15 cores are designed for high performance, while the Cortex-A7 cores are optimized for power efficiency. The performance discrepancy in HYP mode could undermine the benefits of this architecture, particularly in scenarios where the hypervisor is expected to manage task migration between the big and LITTLE cores seamlessly.
Cache and MMU Configuration Differences in HYP and SVC Modes
The primary cause of the performance discrepancy between HYP and SVC modes lies in the configuration of the Memory Management Unit (MMU) and the cache subsystem. In the reported scenario, caching and MMU are disabled in HYP mode, while they are enabled in SVC mode. This difference in configuration has a profound impact on the system’s performance, particularly for operations that involve frequent memory access, such as the delay loop in question.
The ARMv7-A architecture provides a sophisticated memory system that includes both instruction and data caches, as well as an MMU for virtual memory management. The caches are designed to reduce the latency of memory accesses by storing frequently accessed data and instructions closer to the processor. The MMU, on the other hand, translates virtual addresses to physical addresses, enabling features such as memory protection, virtual memory, and address space isolation. When the MMU and caches are disabled, the processor must access memory directly, resulting in significantly higher latency and reduced performance.
In HYP mode, the hypervisor typically manages the MMU and cache configurations to ensure proper isolation and security between different virtual machines or operating systems. However, in the reported scenario, the MMU and caches are explicitly disabled in HYP mode, leading to a substantial performance degradation. This is evident from the observation that disabling the caches and MMU in SVC mode results in a similar performance drop, although the performance in SVC mode remains slightly better (approximately 2 times faster) than in HYP mode.
The discrepancy in performance between HYP and SVC modes, even with caches and MMU disabled, suggests that there are additional factors at play. One such factor could be the differences in the way HYP and SVC modes handle memory accesses and pipeline stalls. In HYP mode, the processor may be subject to additional security checks and virtualization overhead, which could further degrade performance. Additionally, the boot process and the state of the system prior to entering HYP mode could influence the performance. For instance, if the system transitions from a secure world to HYP mode, certain configurations or states may persist, affecting the performance in HYP mode.
Another potential cause of the performance discrepancy is the behavior of the System Control Register (SCTLR) in HYP mode. The SCTLR is responsible for controlling various system features, including the MMU, caches, and alignment checks. In the reported scenario, attempts to disable the MMU and caches in SVC mode by modifying the SCTLR did not yield the expected results. Specifically, the SCTLR value read back after modification did not match the written value, indicating that certain bits were not being set or cleared as intended. This could be due to the fact that the SCTLR is a privileged register, and its behavior may vary depending on the processor mode and the security state. In HYP mode, the SCTLR may be subject to additional restrictions or may be influenced by the hypervisor’s configuration, leading to unexpected behavior.
Enabling Cache and MMU in HYP Mode and Validating SCTLR Configuration
To address the performance discrepancy between HYP and SVC modes, it is essential to ensure that the MMU and caches are properly configured in HYP mode. This involves enabling the MMU and caches, as well as validating the SCTLR configuration to ensure that the desired settings are applied correctly.
The first step in resolving the performance issue is to enable the MMU and caches in HYP mode. This can be achieved by setting the appropriate bits in the SCTLR. The SCTLR controls various system features, including the MMU, instruction and data caches, and alignment checks. The relevant bits in the SCTLR for enabling the MMU and caches are:
- Bit 0 (M): Enables the MMU.
- Bit 2 (C): Enables the data cache.
- Bit 12 (I): Enables the instruction cache.
To enable the MMU and caches in HYP mode, the following steps should be taken:
- Read the current value of the SCTLR: This can be done using the
MRC
instruction to move the contents of the SCTLR into a general-purpose register. - Modify the SCTLR value: Set the appropriate bits to enable the MMU and caches. For example, to enable the MMU, data cache, and instruction cache, the value
0xC5087A
can be written to the SCTLR. - Write the modified value back to the SCTLR: This can be done using the
MCR
instruction to move the modified value from a general-purpose register back into the SCTLR.
However, as observed in the reported scenario, simply writing to the SCTLR may not always result in the desired configuration. This could be due to several reasons, including:
- Privilege level restrictions: The SCTLR is a privileged register, and certain bits may only be modified in specific processor modes or security states. In HYP mode, the hypervisor may impose additional restrictions on the SCTLR configuration.
- Cache coherency and pipeline flushes: When enabling or disabling the MMU and caches, it is essential to ensure that the cache coherency is maintained and that the pipeline is flushed to avoid stale data or instructions. This may require additional instructions or barriers to ensure that the changes take effect correctly.
- Secure world interference: If the system transitions from a secure world to HYP mode, certain configurations or states from the secure world may persist, affecting the SCTLR configuration in HYP mode. This could explain why the SCTLR value read back after modification does not match the written value.
To validate the SCTLR configuration and ensure that the desired settings are applied correctly, the following steps should be taken:
- Verify the SCTLR value after modification: After writing to the SCTLR, read back the value using the
MRC
instruction and compare it with the expected value. If the values do not match, investigate the possible reasons, such as privilege level restrictions or secure world interference. - Check the cache and MMU state: Use performance counters or debug registers to verify that the caches and MMU are indeed enabled and functioning as expected. This can help identify any discrepancies between the SCTLR configuration and the actual system behavior.
- Ensure cache coherency and pipeline flushes: When enabling or disabling the MMU and caches, use appropriate instructions or barriers to ensure that the cache coherency is maintained and that the pipeline is flushed. This may include using the
DSB
(Data Synchronization Barrier) andISB
(Instruction Synchronization Barrier) instructions to ensure that the changes take effect correctly.
In addition to enabling the MMU and caches, it is also important to consider the impact of the hypervisor’s configuration on the system’s performance. The hypervisor may impose additional overhead or restrictions that could affect the performance in HYP mode. For example, the hypervisor may use shadow page tables or additional security checks that could increase the latency of memory accesses. To mitigate this, consider optimizing the hypervisor’s configuration to minimize the overhead and ensure that the system’s resources are utilized efficiently.
Finally, it is important to consider the impact of the boot process on the system’s performance. The boot process may involve multiple stages, including the bootloader, secure world software, and the hypervisor. Each stage may have its own configuration and state, which could influence the performance in HYP mode. To ensure consistent performance, it is essential to carefully manage the boot process and ensure that the system is in a known and optimized state before entering HYP mode.
In conclusion, the performance discrepancy between HYP and SVC modes on the ARMv7-A architecture is primarily due to differences in the MMU and cache configuration. By enabling the MMU and caches in HYP mode and validating the SCTLR configuration, it is possible to mitigate the performance degradation and ensure that the system operates efficiently in both modes. Additionally, careful consideration of the hypervisor’s configuration and the boot process can further optimize the system’s performance and ensure consistent behavior across different processor modes.