ARM Cortex-A9 and PL310 Cache Behavior During Non-Cacheable Writes

The ARM Cortex-A9 processor, when paired with the PL310 L2 cache controller, exhibits complex behavior during non-cacheable writes, especially when the shared override bit is set. The core issue revolves around whether a non-cacheable write operation from an external master (such as a DMA controller) will invalidate a cache line in the L2 cache if the cache line is already present and marked with the shared override bit. This scenario is critical in systems where cache coherency must be maintained across multiple masters accessing shared memory regions.

The shared override bit is a configuration setting in the PL310 cache controller that allows certain memory regions to be treated as shared, even if they are marked as non-cacheable in the page tables. This bit is often used in multi-core systems or systems with DMA controllers to ensure that all masters have a consistent view of memory. However, the interaction between non-cacheable writes and the shared override bit can lead to subtle coherency issues, particularly when the L2 cache contains a copy of the data being written.

When a non-cacheable write operation is performed, the PL310 cache controller must decide whether to invalidate the corresponding cache line in the L2 cache. If the cache line is marked with the shared override bit, the controller must also consider whether the write operation should propagate to other caches or memory regions. The behavior in this scenario is not always intuitive and can lead to data corruption or stale data being read by other masters.

Memory Attribute Mismatch and Shared Override Bit Configuration

One of the primary causes of this issue is a mismatch between the memory attributes specified in the page tables and the configuration of the shared override bit in the PL310 cache controller. The ARM Cortex-A9 processor uses memory attributes defined in the page tables to determine how memory accesses should be handled. These attributes include cacheability, shareability, and memory type (e.g., device or normal memory). When a memory region is marked as non-cacheable in the page tables, the processor typically bypasses the cache for accesses to that region.

However, the PL310 cache controller’s shared override bit can override these memory attributes, causing the cache controller to treat the region as shared and potentially cacheable. This mismatch can lead to situations where a non-cacheable write operation is incorrectly cached in the L2 cache, or where a cache line is not invalidated when it should be. This is particularly problematic in systems where multiple masters are accessing the same memory region, as it can lead to coherency issues.

Another potential cause is the timing of cache invalidation operations. The PL310 cache controller may not immediately invalidate a cache line when a non-cacheable write operation occurs, especially if the cache line is marked with the shared override bit. This delay can result in other masters reading stale data from the cache, even after the write operation has completed. The exact timing of cache invalidation operations depends on the configuration of the cache controller and the specific implementation of the ARM Cortex-A9 processor.

Additionally, the behavior of the PL310 cache controller can be influenced by the configuration of the AXI bus and the interconnect fabric. The AXI bus protocol includes signals for cache maintenance operations, such as cache line invalidation and clean operations. If these signals are not properly configured or if there are delays in the interconnect fabric, the cache controller may not receive the necessary signals to invalidate the cache line in a timely manner. This can exacerbate coherency issues, particularly in high-performance systems where multiple masters are accessing shared memory regions simultaneously.

Implementing Cache Maintenance Operations and Memory Barrier Instructions

To address the issue of non-cacheable writes and the shared override bit in the ARM Cortex-A9 and PL310 cache controller, a combination of cache maintenance operations and memory barrier instructions must be used. The first step is to ensure that the memory attributes in the page tables are consistent with the configuration of the shared override bit in the PL310 cache controller. If a memory region is marked as non-cacheable in the page tables, the shared override bit should be cleared to prevent the cache controller from treating the region as shared.

Next, cache maintenance operations should be performed to ensure that the L2 cache is properly invalidated when a non-cacheable write operation occurs. The ARM Cortex-A9 processor provides several instructions for cache maintenance, including the Data Cache Clean and Invalidate by Set/Way (DCCISW) instruction and the Data Cache Invalidate by Set/Way (DCISW) instruction. These instructions can be used to invalidate specific cache lines or entire sets of cache lines in the L2 cache. When a non-cacheable write operation is performed, the corresponding cache line should be invalidated using one of these instructions to ensure that the cache does not contain stale data.

In addition to cache maintenance operations, memory barrier instructions should be used to ensure that the cache invalidation operations are completed before the write operation is considered complete. The ARM Cortex-A9 processor provides several memory barrier instructions, including the Data Synchronization Barrier (DSB) instruction and the Data Memory Barrier (DMB) instruction. These instructions can be used to enforce ordering constraints between memory accesses and cache maintenance operations. After performing a cache invalidation operation, a DSB instruction should be executed to ensure that the invalidation operation is completed before any subsequent memory accesses are performed.

Finally, the configuration of the AXI bus and the interconnect fabric should be reviewed to ensure that cache maintenance operations are properly propagated to the PL310 cache controller. The AXI bus protocol includes signals for cache maintenance operations, such as the ARCACHE and AWCACHE signals, which indicate the cacheability and shareability of a memory access. These signals should be configured to match the memory attributes in the page tables and the shared override bit in the PL310 cache controller. Additionally, the interconnect fabric should be designed to minimize delays in the propagation of cache maintenance operations, particularly in high-performance systems where multiple masters are accessing shared memory regions simultaneously.

In summary, the issue of non-cacheable writes and the shared override bit in the ARM Cortex-A9 and PL310 cache controller can be addressed by ensuring consistent memory attributes, performing cache maintenance operations, and using memory barrier instructions. By carefully configuring the cache controller and the AXI bus, and by using the appropriate cache maintenance and memory barrier instructions, coherency issues can be minimized, ensuring that all masters have a consistent view of memory.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *