ARM Cortex-M55 AXI 64-bit Peripheral Write Access Issue: Splitting into Two 32-bit Transactions
The ARM Cortex-M55 processor, while capable of generating 64-bit AXI transactions for normal memory, splits 64-bit write accesses to peripheral (device) memory into two separate 32-bit AXI transactions. This behavior is observed when using the STRD
(Store Register Dual) instruction to write 64-bit data to a peripheral memory region. The issue manifests as two 32-bit AXI write operations instead of a single 64-bit operation, which can lead to inefficiencies in data transfer and potential timing concerns in real-time systems.
This behavior is not a bug but rather a design choice in the Cortex-M55’s AXI interface, particularly when dealing with device memory. The splitting of 64-bit writes into two 32-bit transactions is influenced by the memory type (normal vs. device) and the AXI protocol’s constraints for peripheral access. Understanding the root cause and potential workarounds is critical for developers working on performance-sensitive or real-time embedded systems.
Memory Type and AXI Protocol Constraints for Peripheral Writes
The Cortex-M55’s AXI interface differentiates between normal memory and device memory, each with distinct access characteristics. Normal memory supports full 64-bit AXI transactions, while device memory imposes restrictions on burst lengths and transaction sizes. These constraints are primarily driven by the need to minimize interrupt latency and ensure predictable access timing for peripherals.
When a 64-bit write operation is performed on normal memory, the Cortex-M55 generates a single AXI transaction with AWSIZE=0x3
(8 bytes) and AWLEN=0x0
(1 beat). The write strobes (WSTRB
) are set to 0xFF
, enabling a full 64-bit write in one cycle. However, when the same operation targets device memory, the AXI interface splits the write into two 32-bit transactions. The first transaction has AWSIZE=0x2
(4 bytes) and AWLEN=0x1
(2 beats), with write strobes set to 0x0F
for the lower 32 bits. The second transaction completes the operation with write strobes set to 0xF0
for the upper 32 bits.
This behavior is documented in the Cortex-M55 Technical Reference Manual (TRM), specifically in Section 10.4, "Manager-AXI interface." The TRM explains that device memory writes are limited to a maximum burst length of two to ensure that the processor can handle interrupts promptly. If a full 64-bit write were allowed, the processor would need to wait for the entire transaction to complete before servicing an interrupt, potentially increasing latency in real-time systems.
Implementing Workarounds for Single 64-bit Peripheral Writes
While the Cortex-M55’s AXI interface does not natively support single 64-bit writes to device memory, developers can implement workarounds to achieve similar functionality. These solutions involve careful memory mapping, software optimizations, and leveraging the processor’s capabilities for normal memory access.
Memory Remapping to Normal Memory Regions
One effective workaround is to remap the peripheral memory region to a normal memory region. This approach allows the Cortex-M55 to generate single 64-bit AXI transactions for writes. However, this solution requires careful consideration of the system’s memory architecture and potential trade-offs in terms of access timing and interrupt handling.
To implement this workaround, developers can configure the Memory Protection Unit (MPU) or Memory Management Unit (MMU) to treat the peripheral memory region as normal memory. This configuration enables the Cortex-M55 to perform 64-bit writes without splitting them into two 32-bit transactions. However, developers must ensure that the remapped region retains the necessary attributes for peripheral access, such as non-cacheability and non-bufferability.
Software-Based Data Packing and Unpacking
If memory remapping is not feasible, developers can use software-based techniques to pack and unpack 64-bit data into 32-bit chunks. This approach involves splitting the 64-bit data into two 32-bit values in software and performing two separate 32-bit writes to the peripheral memory. While this method does not eliminate the split transactions, it provides greater control over the timing and sequence of the writes.
For example, consider the following code snippet:
uint64_t data = 0x123456789ABCDEF0;
uint32_t* peripheral_address = (uint32_t*)0x40000000;
// Split 64-bit data into two 32-bit values
uint32_t lower_32 = (uint32_t)(data & 0xFFFFFFFF);
uint32_t upper_32 = (uint32_t)((data >> 32) & 0xFFFFFFFF);
// Perform two 32-bit writes
peripheral_address[0] = lower_32;
peripheral_address[1] = upper_32;
This approach ensures that the 64-bit data is written to the peripheral memory in two 32-bit transactions, with explicit control over the order and timing of the writes.
Leveraging DMA for Bulk Transfers
For applications requiring high-throughput data transfers to peripherals, developers can use Direct Memory Access (DMA) controllers to handle bulk data transfers. The Cortex-M55’s DMA controller can be configured to perform 64-bit transfers to normal memory regions, bypassing the AXI interface’s limitations for device memory.
By configuring the DMA controller to treat the peripheral memory region as normal memory, developers can achieve single 64-bit transfers without splitting them into two 32-bit transactions. However, this approach requires careful configuration of the DMA controller and may involve additional overhead for setting up and managing DMA transfers.
Performance Considerations and Trade-offs
Each of the above workarounds involves trade-offs in terms of performance, complexity, and system design. Memory remapping provides the most straightforward solution but may not be feasible in systems with strict memory architecture constraints. Software-based data packing and unpacking offer greater flexibility but introduce additional overhead in terms of CPU cycles and code complexity. Leveraging DMA controllers can improve throughput but requires careful configuration and management.
Developers must evaluate these trade-offs based on the specific requirements of their application, including real-time constraints, memory architecture, and performance goals.
Conclusion
The ARM Cortex-M55’s AXI interface splits 64-bit peripheral write accesses into two 32-bit transactions due to constraints on burst lengths and interrupt latency for device memory. While this behavior is by design, it can pose challenges for developers working on performance-sensitive or real-time systems. By understanding the underlying causes and implementing appropriate workarounds, developers can optimize data transfers to peripherals and achieve the desired performance characteristics.
Memory remapping, software-based data packing, and DMA-based transfers offer viable solutions for addressing this issue, each with its own trade-offs. Careful consideration of the system’s requirements and constraints is essential for selecting the most appropriate approach. With these strategies, developers can effectively manage 64-bit peripheral writes on the Cortex-M55 and ensure optimal performance in their embedded systems.