ARM Cortex-A53 Data Abort Exception 0x96000035 During Stage-2 Translation

The issue at hand involves an ARM Cortex-A53 processor running Linux on a TI platform with two clusters, each containing two cores. The system is configured to run in HYP mode (Hypervisor mode, EL2) with stage-2 translation enabled. When stage-2 translation is active, a data abort exception with the error code 0x96000035 is generated at EL1 (Linux kernel). This error code indicates an "unsupported exclusive or atomic access." The exception does not occur when stage-2 translation is disabled, and the system boots successfully. The user has attempted various combinations of shareability attributes without resolving the issue.

The Cortex-A53 processor supports exclusive and atomic access instructions such as LDREX and STREX, which are crucial for implementing synchronization primitives in multi-core systems. However, when stage-2 translation is enabled, these instructions fail, leading to the data abort exception. This suggests that the memory attributes or translation settings in stage-2 are incompatible with the requirements for exclusive or atomic access.

Stage-2 translation is typically used in virtualization scenarios to translate guest physical addresses to host physical addresses. The translation process involves memory attributes that determine how memory accesses are handled, including cacheability, shareability, and access permissions. If these attributes are misconfigured, they can prevent exclusive or atomic accesses from functioning correctly.

The exception being generated at EL1 instead of EL2 is also noteworthy. This indicates that the fault is being propagated from EL2 to EL1, which may be due to the configuration of the Hypervisor Configuration Register (HCR_EL2) or the exception handling mechanism.

Misconfigured Stage-2 Memory Attributes and HCR_EL2 Settings

The root cause of the issue lies in the interaction between stage-2 translation and the memory attributes configured for the affected memory regions. When stage-2 translation is enabled, the memory attributes defined in the stage-2 page tables determine how memory accesses are handled. If these attributes are incompatible with exclusive or atomic access, the processor will generate a data abort exception.

One possible cause is that the stage-2 translation is marking the memory region as "Device" memory. Device memory has strict access requirements and does not support exclusive or atomic operations. This can happen if the cacheability attributes in the stage-2 page tables are misconfigured. For example, if the memory region is marked as non-cacheable or device memory, the processor will treat it as such, leading to the unsupported access exception.

Another potential cause is the configuration of the HCR_EL2 register. The HCR_EL2 register controls various aspects of the hypervisor’s behavior, including how memory accesses are handled. Two specific bits in HCR_EL2 are relevant to this issue: HCR_EL2.CD (Cache Disable) and HCR_EL2.DC (Data Cache Enable). If HCR_EL2.CD is set, the data cache is disabled for stage-1 translations, which can affect how memory accesses are handled. Similarly, HCR_EL2.DC controls whether the stage-1 data cache is enabled or disabled. If these bits are misconfigured, they can lead to unexpected behavior when accessing memory, including the generation of data abort exceptions.

Additionally, the shareability attributes of the memory region may play a role. Exclusive and atomic accesses require the memory region to be marked as shareable to ensure proper synchronization between cores. If the memory region is marked as non-shareable, the processor may not be able to perform the required synchronization, leading to the unsupported access exception.

Finally, the exception being generated at EL1 instead of EL2 suggests that the fault is being propagated from EL2 to EL1. This can happen if the exception handling mechanism is not correctly configured to handle the fault at EL2. The HCR_EL2 register also controls how exceptions are routed between exception levels. If the exception routing is misconfigured, faults that should be handled at EL2 may instead be propagated to EL1.

Correcting Stage-2 Translation and HCR_EL2 Configuration

To resolve the issue, the following steps should be taken to ensure that the stage-2 translation and HCR_EL2 settings are correctly configured:

  1. Verify Stage-2 Memory Attributes: The first step is to verify the memory attributes defined in the stage-2 page tables. Ensure that the memory region in question is not marked as "Device" memory. The memory region should be marked as normal memory with the appropriate cacheability and shareability attributes. For exclusive and atomic accesses, the memory region should be marked as shareable and cacheable.

  2. Check HCR_EL2 Configuration: Review the configuration of the HCR_EL2 register, paying particular attention to the HCR_EL2.CD and HCR_EL2.DC bits. Ensure that these bits are set correctly based on the desired behavior of the stage-1 and stage-2 caches. If the data cache is required for stage-1 translations, ensure that HCR_EL2.CD is not set and that HCR_EL2.DC is set appropriately.

  3. Configure Exception Routing: Ensure that the exception routing is correctly configured in the HCR_EL2 register. If the fault should be handled at EL2, ensure that the appropriate bits in HCR_EL2 are set to route the exception to EL2. This will prevent the fault from being propagated to EL1.

  4. Test with Different Shareability Attributes: If the issue persists, test the system with different shareability attributes for the memory region. Ensure that the memory region is marked as shareable to support exclusive and atomic accesses. If the memory region is marked as non-shareable, the processor may not be able to perform the required synchronization, leading to the unsupported access exception.

  5. Debugging with Cache Management: If the issue is still unresolved, consider using cache management instructions to ensure that the cache is in a consistent state. Use data synchronization barriers (DSB) and instruction synchronization barriers (ISB) to ensure that all memory accesses are completed before proceeding. Additionally, consider invalidating the cache for the affected memory region to ensure that there are no stale entries that could be causing the issue.

  6. Review ARM Architecture Documentation: Finally, review the ARM Architecture Reference Manual for the Cortex-A53 processor to ensure that all relevant settings and configurations are correctly applied. Pay particular attention to the sections on memory attributes, exception handling, and the HCR_EL2 register.

By following these steps, the issue of unsupported exclusive or atomic access exceptions during stage-2 translation should be resolved. The key is to ensure that the memory attributes and HCR_EL2 settings are correctly configured to support the required memory access types.

Summary of Key Settings

Setting Description Recommended Value for Exclusive/Atomic Access
Stage-2 Memory Attributes Cacheability and shareability attributes in stage-2 page tables Normal memory, shareable, cacheable
HCR_EL2.CD Cache Disable bit for stage-1 translations 0 (Cache enabled)
HCR_EL2.DC Data Cache Enable bit for stage-1 translations 1 (Data cache enabled)
Exception Routing Configuration of exception routing in HCR_EL2 Route exceptions to EL2
Shareability Attributes Shareability attributes for the memory region Shareable

By carefully reviewing and adjusting these settings, the system should be able to support exclusive and atomic accesses without generating data abort exceptions.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *