Cortex-A9 Atomic Variable Deadlock in Multi-Core Bare-Metal Systems

In multi-core ARM Cortex-A9 systems, particularly in bare-metal environments, implementing atomic operations across cores can lead to deadlocks or infinite loops if the memory management unit (MMU) configuration is not properly set up. This issue often manifests when using atomic variables in shared memory regions, where one core can access the variable while the other core is stuck in a loop attempting to perform an atomic operation. The root cause typically lies in the MMU’s memory attributes, cache configuration, and the absence of a global exclusives monitor. This guide will delve into the specifics of the issue, explore potential causes, and provide detailed troubleshooting steps and solutions.


MMU Configuration and Cache Coherency in Cortex-A9 Systems

The Cortex-A9 processor relies heavily on the MMU to manage memory attributes, which directly influence how memory accesses are handled, especially in multi-core systems. The MMU’s translation tables define memory regions with specific attributes such as cacheability, shareability, and access permissions. These attributes are critical for ensuring that atomic operations, which rely on the Load-Exclusive (LDREX) and Store-Exclusive (STREX) instructions, function correctly across cores.

In the provided scenario, the shared memory region is mapped with the following default MMU attributes:

  • S (Shareable): b0 (Non-shareable)
  • TEX (Type Extension): b100
  • AP (Access Permissions): b11 (Read/Write)
  • Domain: b0
  • C (Cacheable): b1
  • B (Bufferable): b1

This configuration is problematic for atomic operations because the memory region is marked as non-shareable, meaning that changes made by one core are not guaranteed to be visible to the other core. Additionally, the cacheability and bufferability settings can lead to inconsistencies if the caches are not properly managed.

When the MMU configuration is modified using Xil_SetTlbAttributes(0xffff0000, 0x14de2), the attributes change to:

  • S: b1 (Shareable)
  • TEX: b100
  • AP: b11
  • Domain: b1111
  • C: b0 (Non-cacheable)
  • B: b0 (Non-bufferable)

While this modification ensures that the memory region is shareable, making it visible to both cores, it also disables caching. This leads to another issue: the absence of a global exclusives monitor. The Cortex-A9 requires a global monitor to track exclusive accesses across cores, and this monitor is only effective when the memory region is cacheable. Without caching, the STREX instruction will always fail, causing the core to retry the atomic operation indefinitely, resulting in a deadlock.


Missing Global Exclusives Monitor and Cache Configuration

The Cortex-A9’s exclusive access mechanism relies on the global exclusives monitor, which tracks the state of exclusive load and store operations. This monitor is implemented in the cache subsystem and requires the memory region to be cacheable. When the memory region is marked as non-cacheable, the exclusives monitor cannot function, causing the STREX instruction to fail consistently.

The assembly code for the atomic increment operation reveals the issue:

dmb ish                ; Data Memory Barrier
.L2:
ldrex r3, [r0]         ; Exclusive Load
add r3, r3, #1         ; Increment
strex r2, r3, [r0]     ; Exclusive Store
cmp r2, #0             ; Check if STREX succeeded
bne .L2                ; Retry if STREX failed
dmb ish                ; Data Memory Barrier

In this code, the strex instruction fails repeatedly because the memory region is non-cacheable, and the global exclusives monitor is not active. The core is stuck in an infinite loop, retrying the atomic operation.

Another contributing factor is the absence of TEX remapping. TEX remapping allows for more flexible memory attribute configurations, but in this case, it is disabled (TRE bit in the SCTLR register is 0). Without TEX remapping, the memory attributes are limited to the predefined combinations, which may not be suitable for cross-core atomic operations.


Correcting MMU Attributes and Enabling Cache Coherency

To resolve the deadlock issue, the MMU attributes must be configured to ensure that the shared memory region is both cacheable and shareable. This enables the global exclusives monitor and ensures that atomic operations can proceed without contention. The following steps outline the necessary changes:

  1. Enable Caching for the Shared Memory Region:
    The shared memory region must be marked as cacheable to activate the global exclusives monitor. This can be achieved by setting the C and B bits in the MMU attributes. For example:

    • S: b1 (Shareable)
    • TEX: b101 or b111
    • AP: b11
    • Domain: b1111
    • C: b1 (Cacheable)
    • B: b1 (Bufferable)

    These configurations ensure that the memory region is cacheable and shareable, allowing the exclusives monitor to function correctly.

  2. Verify TEX Remap Configuration:
    If TEX remapping is enabled (TRE bit in the SCTLR register is 1), the PRRR and NMRR registers must be configured to map the desired memory attributes. However, in this case, TEX remapping is disabled, so the predefined attribute combinations must be used.

  3. Ensure Inner Shareability:
    For cross-core atomic operations, the memory region must be marked as inner shareable. This ensures that changes made by one core are immediately visible to the other core. The S bit in the MMU attributes controls this behavior.

  4. Enable L1 and L2 Caches:
    The L1 and L2 caches must be enabled to support caching for the shared memory region. This can be done by setting the appropriate bits in the SCTLR and auxiliary control registers.

  5. Use Data Synchronization Barriers:
    Data Synchronization Barriers (DSB) and Data Memory Barriers (DMB) should be used to ensure that memory operations are completed in the correct order. The dmb ish instruction in the assembly code already provides this functionality, but it is essential to ensure that barriers are used consistently throughout the code.

By applying these changes, the shared memory region will be properly configured for atomic operations, and the deadlock issue will be resolved. The corrected MMU attributes ensure that the global exclusives monitor is active, allowing the STREX instruction to succeed and enabling seamless synchronization between the cores.


Summary of Key Fixes and Best Practices

To prevent deadlocks and ensure proper synchronization in Cortex-A9 multi-core systems, follow these best practices:

  • Configure the MMU attributes to make shared memory regions cacheable and shareable.
  • Enable the global exclusives monitor by ensuring that the memory region is cacheable.
  • Use inner shareability for cross-core atomic operations.
  • Enable L1 and L2 caches to support caching for shared memory regions.
  • Use Data Synchronization Barriers (DSB) and Data Memory Barriers (DMB) to enforce memory operation ordering.

By adhering to these guidelines, you can avoid common pitfalls associated with atomic operations in multi-core ARM systems and ensure reliable and efficient synchronization between cores.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *