Optimizing ACP-L2 Cache Interaction on Zynq Cortex-A9 for FPGA-ARM Data Transfers

ACP-L2 Cache Coherency and Write Allocation Challenges

The core issue revolves around leveraging the Accelerator Coherency Port (ACP) on a Zynq-7000 device with a Cortex-A9 processor to directly store data into the L2 cache without involving physical memory. The goal is to use the L2 cache as a temporary storage medium for data generated by an FPGA-based ADC, which periodically writes fixed-size data blocks to a fixed address in the L2 cache. The software running on the Cortex-A9 core then retrieves and processes this data. However, achieving this requires careful handling of cache coherency, write allocation policies, and memory attributes to ensure data integrity and avoid unnecessary writes to non-existent physical memory.

The primary challenge lies in configuring the ACP transactions to enforce coherency and allocate data directly into the L2 cache. The Zynq Technical Reference Manual (TRM) specifies that ACP write requests are coherent when specific AXI signals (AWUSER[0] and AWCACHE[1]) are set alongside AWVALID. This ensures that the Snoop Control Unit (SCU) enforces coherency between the L1 and L2 caches. However, the TRM also notes that the transaction can optionally allocate into the L2 cache based on write parameters, which introduces ambiguity about whether data is written to both the L2 cache and physical memory or only to the L2 cache.

Additionally, the absence of DDR memory complicates the scenario, as the system must avoid generating write-back transactions that would typically target physical memory. This necessitates precise configuration of cache attributes and memory mapping to prevent errors such as AXI_SLV_ERROR or AXI_DECODE_ERROR, which could trigger a Data Abort routine.

SCU Coherency Enforcement and Cache Allocation Policies

The Snoop Control Unit (SCU) plays a critical role in maintaining coherency between the L1 and L2 caches. When an ACP write request is marked as coherent (AWUSER[0] = 1 and AWCACHE[1] = 1), the SCU ensures that any stale data in the L1 caches is invalidated or updated before the write proceeds. This is achieved through the MESI (Modified, Exclusive, Shared, Invalid) coherency protocol, which guarantees that all cache levels have consistent data.

However, the SCU’s behavior depends on the cache allocation policy configured for the transaction. Cache allocation determines whether a cache line is reserved in the L2 cache when a write occurs. For example, a write-allocate policy ensures that a cache line is allocated in the L2 cache if the write misses, while a no-write-allocate policy bypasses the cache and writes directly to memory. In this case, since the goal is to store data exclusively in the L2 cache, a write-allocate policy must be enforced.

The cache allocation policy is configured through the Memory Management Unit (MMU) translation table, which defines memory attributes such as cacheability, write-back/write-through behavior, and allocation policies. For the FPGA-generated data to be stored in the L2 cache, the corresponding memory range must be marked as inner cacheable with write-back and write-allocate attributes. This ensures that writes from the ACP allocate lines in the L2 cache without triggering write-back transactions to physical memory.

Implementing ACP-L2 Cache Data Transfers with FPGA Integration

To implement the desired data transfer mechanism, the following steps must be taken:

Configure ACP Write Transactions for Coherency and Allocation: The ACP master in the FPGA must generate AXI write transactions with AWUSER[0] = 1 and AWCACHE[1] = 1 to enforce coherency. Additionally, the AWCACHE[3:0] signals must be set to enable write allocation (e.g., 0b1111 for write-back, write-allocate). This ensures that data is written directly into the L2 cache without involving physical memory.
Define Memory Attributes in the MMU Translation Table: The memory range used for storing ADC data must be configured in the MMU translation table as inner cacheable with write-back and write-allocate attributes. This prevents write-back transactions to non-existent physical memory and ensures that data remains in the L2 cache.
Initialize L2 Cache and SCU Settings: Before starting data transfers, the L2 cache must be invalidated to ensure that no stale data interferes with the new writes. The Cache Level ID Register should be configured to set the Level of Unification (LoU) to L1, and maintenance broadcast settings should be adjusted to ensure that cache operations target the correct levels.
Synchronize Software and FPGA Operations: The software running on the Cortex-A9 core must synchronize with the FPGA to ensure that data is only read from the L2 cache after the ACP master has completed its write operations. This can be achieved using event signals or interrupts generated by the FPGA.
Avoid Compulsory Misses: Since the first write to any address will result in a compulsory miss, the software should pre-initialize the L2 cache by writing dummy data to the target addresses. This ensures that subsequent writes from the ACP master hit in the L2 cache, minimizing latency and avoiding unnecessary transactions.
Handle Cache Line Granularity: The L2 cache line size for Cortex-A9 is 32 bytes, while the ADC data is 16 bits. The ACP master must handle writes at the appropriate granularity to avoid overwriting adjacent data. This can be achieved by using byte-level write enable signals in the AXI transaction.

By following these steps, the system can effectively use the L2 cache as a temporary storage medium for FPGA-generated data, eliminating the need for physical memory and ensuring efficient data transfers between the FPGA and the Cortex-A9 core. Proper configuration of cache attributes, coherency settings, and synchronization mechanisms is critical to achieving reliable and high-performance operation.

Optimizing ACP-L2 Cache Interaction on Zynq Cortex-A9 for FPGA-ARM Data Transfers

ACP-L2 Cache Coherency and Write Allocation Challenges

SCU Coherency Enforcement and Cache Allocation Policies

Implementing ACP-L2 Cache Data Transfers with FPGA Integration

ARM Cortex-M4 MPU Region Size Halving Due to Misaligned Base Address

Declaring Secure World Variables in ARM TrustZone-M for Cortex-M Processors

Multi-ARM CPU Motherboard Design for High-Performance, Low-Power Numerical Applications

ARM Cortex-R52 Exception Triggering Encoding for A32 and T32 Instruction Sets

NVIC_EnableIRQ Behavior and Interrupt Enable Register (ISER) Management in ARM Cortex-M Processors

ARM Cortex-A Instruction Fetch Alignment and Cache Access Optimization

Leave a Reply Cancel reply

ACP-L2 Cache Coherency and Write Allocation Challenges

SCU Coherency Enforcement and Cache Allocation Policies

Implementing ACP-L2 Cache Data Transfers with FPGA Integration

Similar Posts

Leave a Reply Cancel reply