CHI Device Memory Type Byte Access Behavior and Boundary Crossing
In ARM’s Coherent Hub Interface (CHI) protocol, the handling of "Device" memory type transactions presents unique challenges, particularly when the transaction size exceeds the natural alignment boundary of the address space. Unlike "Normal" memory type transactions, which can wrap around address boundaries seamlessly, "Device" memory type transactions adhere to stricter access rules due to their non-cacheable and non-bufferable nature. This behavior is critical for ensuring predictable and deterministic access to memory-mapped peripherals and other device regions.
When a transaction starts at an unaligned address, such as 0x70, and requests a transfer size of 64 bytes, the CHI protocol mandates that the access begins at the exact start address (0x70) and proceeds linearly until the end of the aligned boundary (0x7F in this case). This results in a 16-byte access within the first aligned block. However, the remaining 48 bytes of the transaction must be handled differently due to the constraints of "Device" memory type access rules. The CHI protocol does not allow address wrapping for "Device" memory, meaning the transaction cannot automatically continue from the next aligned boundary (0x40) as it would in "Normal" memory type transactions.
This behavior is rooted in the ARM architecture’s requirement for strict ordering and atomicity in "Device" memory accesses. Since "Device" memory is typically used for peripherals and registers, any deviation from the specified access pattern could lead to unpredictable behavior, such as incorrect register updates or missed interrupts. Therefore, the CHI protocol enforces a linear access pattern for "Device" memory, even if it results in fragmented transactions when crossing alignment boundaries.
Memory Alignment Constraints and Transaction Fragmentation
The core issue arises from the interaction between the transaction size, the start address alignment, and the memory type. In the case of "Device" memory, the CHI protocol treats each aligned block as a separate entity, and transactions cannot span multiple blocks without explicit handling. This is in contrast to "Normal" memory, where the protocol can automatically handle address wrapping and continuous access across boundaries.
For a 64-byte transaction starting at 0x70, the first 16 bytes (from 0x70 to 0x7F) are accessed within the initial aligned block. However, the remaining 48 bytes must be accessed starting from the next aligned boundary (0x80). This requires the transaction to be split into two separate accesses: one for the initial 16 bytes and another for the subsequent 48 bytes. This fragmentation is necessary to comply with the CHI protocol’s rules for "Device" memory type transactions.
The fragmentation of transactions introduces additional complexity in the design of the interconnect and the memory controller. The interconnect must be capable of splitting the original transaction into smaller, compliant transactions while maintaining the correct ordering and atomicity requirements. The memory controller, on the other hand, must handle these fragmented transactions without introducing additional latency or performance degradation.
Furthermore, the alignment constraints of "Device" memory type transactions can impact the overall system performance, especially in scenarios where large data transfers are required. The need to split transactions into smaller chunks can lead to increased bus utilization and higher latency, particularly if the interconnect is not optimized for handling such cases.
Implementing CHI-Compliant Transaction Splitting and Address Handling
To address the challenges posed by unaligned "Device" memory type transactions, the system must implement a robust mechanism for transaction splitting and address handling. This involves several key steps:
-
Transaction Splitting Logic: The interconnect must include logic to detect when a transaction exceeds the alignment boundary and split it into multiple sub-transactions. For example, a 64-byte transaction starting at 0x70 would be split into a 16-byte transaction (0x70 to 0x7F) and a 48-byte transaction (0x80 to 0xAF). This logic must ensure that the split transactions maintain the original transaction’s attributes, such as the memory type, protection level, and QoS parameters.
-
Address Calculation: The interconnect must correctly calculate the addresses for the split transactions. For the initial 16-byte transaction, the start address remains 0x70, and the end address is 0x7F. For the subsequent 48-byte transaction, the start address is 0x80, and the end address is calculated based on the remaining bytes. This requires precise address arithmetic to ensure that the split transactions cover the entire original transaction without overlapping or leaving gaps.
-
Ordering and Atomicity: The interconnect must ensure that the split transactions are executed in the correct order and maintain the atomicity requirements of the original transaction. This is particularly important for "Device" memory type transactions, where strict ordering is often required to ensure correct operation of peripherals and registers. The interconnect may need to implement additional mechanisms, such as transaction IDs or barriers, to enforce the correct ordering.
-
Performance Optimization: To mitigate the performance impact of transaction splitting, the interconnect and memory controller should be optimized to handle fragmented transactions efficiently. This may include techniques such as pipelining, burst mode support, and priority-based arbitration to minimize latency and maximize throughput.
-
Verification and Testing: The implementation of transaction splitting and address handling must be thoroughly verified to ensure compliance with the CHI protocol and correct operation under all conditions. This includes testing with various transaction sizes, start addresses, and memory types to cover all possible corner cases. Simulation and formal verification techniques should be employed to validate the design and identify any potential issues.
By implementing these steps, the system can effectively handle unaligned "Device" memory type transactions while maintaining compliance with the CHI protocol and ensuring correct operation of the memory subsystem. This approach not only addresses the immediate issue of boundary crossing but also provides a foundation for handling other complex memory access scenarios in ARM-based SoCs.