ARM SMMU and Linux Page Table Sharing: Non-Deterministic Behavior During FPGA Transfers

The integration of the ARM System Memory Management Unit (SMMU) with Linux-generated page tables on an ARM Cortex-A53-based Xilinx Zynq Ultrascale+ board presents a complex challenge. The goal is to enable shared virtual addressing (SVA) between the Cortex-A53 CPUs and an FPGA, leveraging the ARM SMMU 500 and Cache Coherent Interface (CCI). The primary issue manifests as non-deterministic behavior during FPGA data transfers when using Linux-generated page tables, while manually constructed page tables function correctly. The problem appears to be related to cache coherency and timing, as the system behaves correctly after a significant delay, suggesting that updated page table entries (PTEs) are not immediately visible to the SMMU.

The core of the issue lies in the synchronization of page table updates between the CPU and the SMMU. When the CPU updates the page tables, these changes may reside in the CPU cache and are not immediately propagated to the SMMU. This results in the SMMU accessing stale or incorrect PTEs, leading to failed or incomplete data transfers. The problem is exacerbated by differences in memory attribute configurations between the CPU and the SMMU, as well as potential misconfigurations in the SMMU’s translation control registers.

Memory Attribute Mismatch and Cache Invalidation Timing

One of the critical factors contributing to the non-deterministic behavior is the mismatch in memory attributes between the CPU and the SMMU. The Memory Attribute Indirection Registers (MAIR) define the memory attributes for the translation tables, and any discrepancy between the CPU and SMMU configurations can lead to inconsistent behavior. In this case, the MAIR settings for the SMMU were initially different from those used by the Linux kernel, causing the SMMU to interpret the page table entries differently than the CPU.

Additionally, the timing of cache invalidation plays a crucial role. When the CPU updates the page tables, these updates are cached, and the SMMU may not see the changes until the cache is flushed. The delay observed in the system’s behavior suggests that the cache invalidation is not occurring at the correct time or is not comprehensive enough to ensure that the SMMU sees the updated PTEs. This is particularly problematic in systems where the SMMU and CPU share the same page tables, as the SMMU relies on the CPU to maintain cache coherency.

Another potential cause is the configuration of the SMMU’s translation control registers, such as the SMMU_CBn_TCR register. This register controls the cacheability of translation table walks, and if not configured correctly, it can lead to the SMMU accessing stale data from the cache. The SMMU’s ability to handle demand paging is also a factor. If the SMMU is not configured to handle page faults or if the pages are not pinned in memory, the SMMU may encounter unallocated or paged-out pages, leading to errors or incomplete transfers.

Implementing Cache Flushing and Correct SMMU Configuration

To address the non-deterministic behavior, a combination of cache management and SMMU configuration adjustments is required. The first step is to ensure that the MAIR settings for the SMMU match those used by the Linux kernel. This ensures that both the CPU and SMMU interpret the page table entries consistently. The MAIR configuration should be verified against the settings in the Linux kernel’s mm/mmu.c file to ensure alignment.

Next, proper cache flushing must be implemented to ensure that the SMMU sees the updated page table entries. This involves using data synchronization barriers (DSB) and cache maintenance operations to flush the relevant cache lines after updating the page tables. The exact location for these operations depends on the specific implementation, but they should be placed at strategic points in the code where page table updates occur. For example, after modifying the page tables, a DSB instruction should be executed to ensure that all previous memory operations are completed before proceeding. This should be followed by a cache flush operation to ensure that the updated PTEs are written to memory and visible to the SMMU.

The SMMU’s translation control registers must also be configured correctly. The SMMU_CBn_TCR register should be set to ensure that translation table walks are cacheable, and the SMMU should be configured to handle page faults appropriately. This may involve setting up threaded interrupts and using the Linux standard page fault handler (handle_mm_fault()) to manage page faults. The vm_fault flags for the page fault handler should be set correctly, as demonstrated in the do_fault() function from the AMD IOMMU driver (amd_iommu_v2.c).

Finally, the SMMU’s context bank configuration must be managed carefully during process switching. Each process should have its own SMMU context bank, and the streamID and context bank mappings should be updated during context switches. This ensures that the SMMU uses the correct page tables for each process and avoids conflicts or stale data.

By addressing these issues, the system can achieve reliable shared virtual addressing between the CPU and FPGA, with the SMMU and CPU sharing the same page tables. This eliminates the need for redundant page table setups and ensures consistent behavior during data transfers. The key is to maintain cache coherency, align memory attribute configurations, and configure the SMMU correctly to handle page faults and translation table walks. With these adjustments, the non-deterministic behavior can be resolved, and the system can operate as intended.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *