ARM Cortex-A725 CASAL Instruction Fault in EL1 with MMU Enabled

The issue revolves around a data abort exception (0x96000035) triggered during the execution of the CASAL (Compare and Swap Atomic Local) instruction on an ARM Cortex-A725 processor. The exception occurs in EL1 (Exception Level 1) with the MMU (Memory Management Unit) enabled, and the fault is classified as an "Unsupported Exclusive or Atomic Access" according to the ESR_EL1 (Exception Syndrome Register) register. The fault is particularly perplexing because the CPU register ID_AA64ISAR0_EL1 indicates that the LSE (Large System Extensions) feature, which includes support for atomic operations like CASAL, is present and enabled. This discrepancy between the hardware capability and the fault condition suggests a deeper issue related to memory attributes, cache configuration, or system-level implementation.

The CASAL instruction is part of the ARMv8.1-LSE extension, which introduces advanced atomic operations to improve performance in multi-core systems. The instruction performs an atomic compare-and-swap operation on a memory location, ensuring that the operation is indivisible and thread-safe. However, the fault indicates that the memory system does not support atomic or exclusive access for the specific address being accessed by the CASAL instruction. This raises questions about the memory attributes, cache configuration, and system-level support for atomic operations.

The fault occurs during the boot process of the QNX kernel, specifically when the kernel attempts to execute the CASAL instruction. The MMU is enabled, and the caches are also enabled (SCTLR_EL1.C is set). The memory address being accessed by the CASAL instruction is 0xFFFFFF80600F8E6C, which falls within a region mapped with write-back cacheable attributes. Despite the cacheable attributes, the fault persists, indicating that the issue is not solely related to cache configuration but may involve deeper system-level considerations.

Memory System Limitations and Cache Coherency Configuration

The root cause of the fault lies in the interaction between the ARM Cortex-A725 processor’s atomic operation support and the memory system’s ability to handle exclusive or atomic accesses. The ARM architecture defines two primary mechanisms for handling atomic operations: cache-based atomicity and memory system-based atomicity. Cache-based atomicity relies on the cache coherency protocol to ensure that atomic operations are performed correctly, while memory system-based atomicity depends on the memory system’s ability to handle exclusive or atomic accesses.

In this case, the fault occurs because the memory system does not support atomic or exclusive accesses for the specific address being accessed by the CASAL instruction. This is despite the fact that the memory region is mapped with write-back cacheable attributes, which should theoretically support cache-based atomicity. The issue is further complicated by the fact that the CPU register ID_AA64ISAR0_EL1 indicates that the LSE feature is supported, suggesting that the processor itself is capable of executing the CASAL instruction.

The memory system’s inability to support atomic or exclusive accesses for the specific address could be due to several factors. One possibility is that the memory region is mapped with incorrect or incompatible attributes, preventing the cache coherency protocol from functioning correctly. Another possibility is that the memory system itself does not support atomic or exclusive accesses for the specified address range, either due to hardware limitations or incorrect configuration.

The SCTLR_EL1 register, which controls the system control settings for EL1, is set to 0x34D5D99D, indicating that the caches are enabled. However, the fault suggests that the cache coherency protocol is not functioning as expected, possibly due to incorrect memory attributes or system-level configuration. The memory attributes for the address being accessed by the CASAL instruction are as follows:

Logical Address Range Physical Address Range Security State Size Permissions Global Shareable Cache Attributes
FFFFFF80600D0000 – FFFFFF8060105FFF 00000000B00CB000 – 00000000B0100FFF Non-Secure 4 KB Read-Write, Non-Executable Yes No Inner: Write-Back, Read-Write Allocate; Outer: Write-Back, Read-Write Allocate

The table shows that the memory region is mapped with write-back cacheable attributes, which should support cache-based atomicity. However, the fault indicates that the memory system does not support atomic or exclusive accesses for this region, suggesting that there may be an issue with the cache coherency protocol or the memory system’s configuration.

Implementing Correct Memory Attributes and Cache Management

To resolve the issue, it is necessary to ensure that the memory system is correctly configured to support atomic or exclusive accesses for the specific address range being accessed by the CASAL instruction. This involves verifying the memory attributes, cache configuration, and system-level support for atomic operations.

First, the memory attributes for the address range being accessed by the CASAL instruction should be reviewed to ensure that they are compatible with atomic operations. The memory region should be mapped with cacheable attributes that support cache-based atomicity, such as write-back cacheable with read-write allocate. Additionally, the memory region should be marked as shareable to ensure that the cache coherency protocol can function correctly across multiple cores.

Second, the cache configuration should be verified to ensure that the caches are enabled and functioning correctly. The SCTLR_EL1 register should be checked to confirm that the caches are enabled (SCTLR_EL1.C is set). Additionally, the cache coherency protocol should be verified to ensure that it is functioning correctly and that the caches are properly synchronized.

Third, the system-level support for atomic operations should be verified. This involves checking the CPU register ID_AA64ISAR0_EL1 to confirm that the LSE feature is supported and enabled. Additionally, the system-level configuration should be reviewed to ensure that the memory system supports atomic or exclusive accesses for the specified address range.

If the memory attributes and cache configuration are correct, but the fault persists, it may be necessary to investigate the system-level implementation of atomic operations. This could involve reviewing the memory system’s design and configuration to ensure that it supports atomic or exclusive accesses for the specified address range. Additionally, it may be necessary to consult the system’s documentation or contact the hardware vendor for further assistance.

In summary, the fault is caused by the memory system’s inability to support atomic or exclusive accesses for the specific address being accessed by the CASAL instruction. To resolve the issue, it is necessary to ensure that the memory attributes, cache configuration, and system-level support for atomic operations are correctly configured. By verifying and correcting these settings, the fault can be resolved, allowing the CASAL instruction to execute successfully.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *