ARMv8-A Cortex-A5x PCIe BAR Write Issues with Normal_NC Memory Attributes
When working with ARMv8-A Cortex-A5x series processors, particularly in systems involving PCIe BAR (Base Address Register) access, developers often encounter subtle issues related to memory coherency and write buffer management. One such issue arises when writing data to a PCIe BAR mapped with the normal_nc
(Normal Non-Cacheable) memory attribute. The problem manifests as incorrect data being written to the PCIe BAR unless the write operation is performed twice or the memory is remapped with a different attribute such as device_nGnRE
(Device Non-Gathering, Non-Reordering, Early Write Acknowledgment). This behavior is particularly problematic in systems where there is no coherency between the ARM CPU and the PCIe device.
The root cause of this issue lies in the interaction between the ARMv8-A memory model, the write buffer, and the PCIe BAR’s memory attributes. The ARMv8-A architecture provides mechanisms to ensure that memory operations are properly ordered and visible to all observers, but these mechanisms must be explicitly managed by the developer when dealing with non-coherent memory regions such as PCIe BARs. This post will delve into the specifics of the issue, explore the underlying causes, and provide detailed troubleshooting steps and solutions to ensure correct data writes to PCIe BARs mapped with normal_nc
attributes.
Memory Ordering and Write Buffer Behavior in ARMv8-A with Normal_NC Attributes
The ARMv8-A architecture defines several memory types, each with specific attributes that govern how memory accesses are handled. The normal_nc
memory type is typically used for memory-mapped I/O regions where caching is not desired, but the memory is still treated as normal memory rather than device memory. This means that writes to normal_nc
memory are buffered and may be reordered by the CPU’s write buffer to optimize performance. However, this buffering and reordering can lead to issues when writing to non-coherent memory regions such as PCIe BARs, where the order and visibility of writes are critical.
In the case of PCIe BARs, the lack of coherency between the ARM CPU and the PCIe device means that the CPU’s write buffer must be explicitly flushed to ensure that the data is written to the PCIe BAR in the correct order and is visible to the PCIe device. Without proper synchronization, the write buffer may hold onto the data, leading to incorrect or incomplete writes. This is particularly problematic when the PCIe device expects data to be written in a specific sequence, such as when writing to a data buffer followed by a control register to initiate a DMA transfer.
The ARMv8-A architecture provides several memory barrier instructions to enforce ordering and visibility of memory operations. The Data Synchronization Barrier (DSB) instruction is particularly relevant in this context, as it ensures that all memory accesses before the DSB are completed before any memory accesses after the DSB are executed. However, the use of DSB must be carefully managed to avoid unnecessary performance overhead and to ensure that it is applied in the correct context.
Implementing Data Synchronization Barriers and Cache Management for PCIe BAR Writes
To address the issue of incorrect data writes to PCIe BARs mapped with normal_nc
attributes, developers must implement proper memory synchronization and cache management techniques. The key is to ensure that the write buffer is flushed before initiating any subsequent operations that depend on the data being written to the PCIe BAR. This can be achieved by inserting a DSB instruction after the write operation to the PCIe BAR.
The following code example demonstrates the correct use of DSB to ensure that data is properly written to a PCIe BAR:
STR W5, [X2] ; Write data to the PCIe BAR data buffer
DSB ST ; Data Synchronization Barrier for store operations
STR W0, [X4] ; Write to the PCIe BAR control register to initiate DMA
In this example, the STR W5, [X2]
instruction writes data to the PCIe BAR data buffer, and the DSB ST
instruction ensures that the write is completed before the STR W0, [X4]
instruction writes to the control register. The DSB ST
instruction is specifically used for store operations and ensures that all previous stores are completed before any subsequent stores are executed.
It is important to note that the DSB ST
instruction is a full system barrier, meaning it affects all observers in the system, including non-coherent devices such as PCIe. This ensures that the data written to the PCIe BAR is visible to the PCIe device before the control register is written.
In addition to using DSB, developers should also consider the memory attributes used for mapping the PCIe BAR. While normal_nc
is suitable for many use cases, it may not be the best choice for all scenarios. For example, if the PCIe device requires strict ordering of writes and immediate visibility of data, using device_nGnRE
attributes may be more appropriate. The device_nGnRE
attribute ensures that writes are not buffered and are immediately visible to the device, eliminating the need for explicit synchronization.
However, using device_nGnRE
attributes can have performance implications, as it prevents the CPU from reordering or combining writes, which can lead to increased latency and reduced throughput. Therefore, the choice of memory attributes should be carefully considered based on the specific requirements of the PCIe device and the overall system performance goals.
In summary, the issue of incorrect data writes to PCIe BARs mapped with normal_nc
attributes in ARMv8-A Cortex-A5x systems can be effectively addressed by implementing proper memory synchronization techniques, such as using DSB instructions, and carefully selecting the appropriate memory attributes. By understanding the underlying memory model and the behavior of the write buffer, developers can ensure reliable and efficient access to PCIe BARs in their embedded systems.