ARM Cortex-A53 Cache Coherency Breakdown in Multi-Core FVP

The described scenario involves an 8-processor Fixed Virtual Platform (FVP) simulation environment where multiple ARM Cortex-A53 cores attempt concurrent write operations to the same 64-byte memory region. Each processor executes a STRB (Store Byte) instruction to a unique byte offset within the target 64-byte word, followed by an LDR (Load Register) instruction to verify coherency. The expected behavior under cache coherency protocols would be for subsequent writes to observe and incorporate prior modifications, resulting in a final 64-byte word containing all written bytes. However, the observed behavior indicates that each processor only sees its own byte modification, suggesting a breakdown in the cache coherency mechanism.

This coherency failure manifests specifically in the interaction between the L1 data caches of individual Cortex-A53 cores and the shared L2 cache subsystem. The ARM Cortex-A53 implements the ARMv8-A architecture, which mandates support for the ACE (AXI Coherency Extensions) protocol when multiple cores share a common memory space. The ACE protocol ensures that cache lines marked as Modified in one core’s L1 cache are properly propagated to other cores’ caches and the shared L2 cache.

The fundamental expectation in this scenario is that when CPU0 writes to address X, the cache line containing X is marked as Modified in CPU0’s L1 cache. When CPU1 subsequently attempts to write to address X+1, the ACE protocol should force CPU1 to fetch the Modified line from CPU0’s cache, merge the new byte write, and mark the line as Modified in CPU1’s cache. This process should continue across all eight processors, with each new write operation properly observing and incorporating previous modifications.

Secure Memory Configuration and DRAM Mapping Constraints

The root cause of the observed cache coherency failure lies in the interaction between the FVP’s secure memory configuration and the physical address mapping of DRAM. The FVP parameter bp.secure_memory=0 was identified as critical for proper DRAM operation in Base_RevC configurations. When secure_memory is enabled (bp.secure_memory=1), the memory system treats DRAM accesses differently based on the security state of the accessing processor, potentially creating separate memory domains that break coherency expectations.

The secure memory configuration interacts with the physical address space mapping in ways that can prevent proper cache coherency maintenance. In the described scenario, the FVP’s default DRAM size parameter (dram_size=2) creates a physical address space constraint where only addresses between 0x8000_0000 and 0xffff_ffff are properly mapped to DRAM. Accesses outside this range may be routed to different memory subsystems or treated as secure accesses, depending on the secure_memory configuration.

The physical address mapping constraint becomes particularly significant when using PA=VA (Physical Address equals Virtual Address) mapping schemes. In such configurations, the virtual address space directly corresponds to the physical address space, meaning that software must ensure all memory accesses fall within the properly mapped DRAM region. Accesses outside this range may bypass the cache coherency mechanisms entirely, leading to the observed behavior where writes appear to be isolated to individual processors.

Implementing Proper DRAM Configuration and Cache Management

To resolve the cache coherency issues, several configuration and implementation steps must be taken. First, the FVP configuration must be updated to disable secure memory by setting bp.secure_memory=0. This ensures that all DRAM accesses are treated uniformly regardless of the processor’s security state, maintaining a single coherency domain for the entire DRAM space.

The physical address mapping must be carefully configured to ensure all memory accesses fall within the properly mapped DRAM region. When using PA=VA mapping, software must ensure that all memory addresses used for shared data structures fall within the 0x8000_0000 to 0xffff_ffff range. This can be achieved through proper linker script configuration and memory allocation strategies.

Cache management operations must be implemented to ensure proper coherency maintenance. While the ACE protocol handles most coherency operations automatically, explicit cache maintenance instructions may be required in certain scenarios. The following sequence should be implemented around critical shared memory accesses:

  1. Data Synchronization Barrier (DSB) before the first write to ensure all previous memory operations are complete
  2. Clean and invalidate cache operations (DC CIVAC) after the last write to ensure modified data is pushed to the shared L2 cache
  3. Invalidate cache operations (DC IVAC) before subsequent reads to ensure the most recent data is fetched

The following table summarizes the key configuration parameters and their impact on cache coherency:

Parameter Recommended Value Impact on Coherency
bp.secure_memory 0 Ensures uniform DRAM access treatment across security states
dram_size 2 Defines proper DRAM mapping range (0x8000_0000 to 0xffff_ffff)
PA=VA mapping Enabled Requires careful address range management to stay within mapped DRAM
Cache maintenance DC CIVAC/DC IVAC Ensures proper propagation of modified cache lines and fetching of latest data

For software implementation, the following code sequence demonstrates proper cache management around shared memory accesses:

// CPU0 writes byte 0
DSB SY
STRB W0, [X1]
DC CIVAC, X1

// CPU1 writes byte 1
DSB SY
LDRB W2, [X1]      // Ensures coherency by fetching latest data
STRB W3, [X1, #1]
DC CIVAC, X1

// Subsequent CPUs follow same pattern
...

This sequence ensures that each processor properly observes previous modifications before performing its own write, and that all modifications are properly propagated through the cache hierarchy. The Data Synchronization Barrier (DSB) ensures all previous memory operations are complete before proceeding, while the cache maintenance operations ensure proper coherency maintenance.

In addition to the low-level cache management, the system software must implement proper memory allocation strategies to ensure shared data structures are placed within the properly mapped DRAM region. This can be achieved through custom memory allocators or linker script modifications that restrict shared memory allocations to the 0x8000_0000 to 0xffff_ffff range.

For systems requiring secure memory functionality, additional considerations are necessary. The secure memory configuration (bp.secure_memory=1) creates separate memory domains for secure and non-secure accesses, requiring explicit management of shared data structures across security states. In such configurations, the TrustZone Address Space Controller (TZASC) must be properly configured to define shared memory regions that maintain coherency across security states.

The following advanced configuration steps may be necessary for complex systems:

  1. TZASC region configuration to define shared memory spaces
  2. Secure monitor call (SMC) sequences for cross-domain data sharing
  3. Additional cache maintenance operations for secure/non-secure transitions
  4. Memory partitioning to isolate secure and non-secure data while maintaining coherency for shared regions

By implementing these configuration and management strategies, the cache coherency issues in the multi-processor FVP environment can be effectively resolved, ensuring proper operation of shared memory accesses across multiple ARM Cortex-A53 cores.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *