ARMv7 Memory Access Ordering: Strongly Ordered vs. Normal Memory
In ARMv7 architectures, memory types are categorized into Normal, Device, and Strongly Ordered (SO) memory. Each type has distinct characteristics regarding access ordering, caching, and buffering. Normal memory is typically used for general-purpose data and code storage, where the ARM processor can optimize performance through reordering, caching, and speculative execution. In contrast, Strongly Ordered memory is used for memory-mapped peripherals or critical regions where access ordering must be strictly preserved to ensure correct system behavior.
The ARMv7 architecture specifies that all memory accesses to Strongly Ordered memory occur in program order. This means that the processor must execute load and store operations to Strongly Ordered memory in the exact sequence they appear in the program. However, this guarantee does not extend to interactions between Strongly Ordered memory and Normal memory. When switching between these memory types, the processor’s memory model allows for potential reordering of accesses unless explicit synchronization mechanisms, such as memory barriers, are used.
The key distinction lies in the memory model’s weak ordering for Normal memory. ARM processors can reorder Normal memory accesses for performance optimization, but this reordering can lead to unexpected behavior when interacting with Strongly Ordered memory. For example, a store operation to Normal memory followed by a load operation from Strongly Ordered memory might not execute in the expected sequence due to the processor’s pipeline optimizations or store buffer behavior. This reordering can cause functional issues in systems where the order of memory accesses is critical, such as in communication with peripherals or synchronization between cores in a multi-core system.
Instruction Reordering and Memory Access Reordering in ARMv7
The confusion surrounding memory barriers often stems from the distinction between instruction reordering and memory access reordering. Instruction reordering is a performance optimization technique used by the processor pipeline, where the processor may execute instructions out of their program order to improve throughput. However, this reordering is typically invisible to software, as the processor ensures that the final architectural state matches the expected program order.
Memory access reordering, on the other hand, refers to the observable reordering of load and store operations as seen by other processors or devices in the system. ARM’s weakly ordered memory model allows for such reordering, particularly for Normal memory accesses. For example, consider two store operations to Normal memory: STRB 0xA, [0x8000]
followed by STRB 0xB, [0x8001]
. Without a memory barrier, another processor or device might observe the second store (0xB
at 0x8001
) before the first store (0xA
at 0x8000
). This behavior is legal under ARM’s memory model but can lead to unexpected results if the order of operations is critical.
When accessing Strongly Ordered memory, the ARM architecture enforces stricter rules. All accesses to Strongly Ordered memory must occur in program order, and the processor must drain the store buffer before completing a store operation to Strongly Ordered memory. However, these rules only apply to accesses within the same memory type. When switching between Normal and Strongly Ordered memory, the processor does not automatically enforce ordering between the two. This is where memory barriers become essential.
Implementing Data Synchronization Barriers for Correct Memory Access Ordering
To ensure correct memory access ordering when switching between Normal and Strongly Ordered memory, ARM provides several memory barrier instructions: Data Memory Barrier (DMB), Data Synchronization Barrier (DSB), and Instruction Synchronization Barrier (ISB). These barriers enforce ordering constraints on memory accesses and ensure that all preceding operations complete before subsequent operations begin.
The Data Memory Barrier (DMB) ensures that all memory accesses before the barrier are completed before any memory accesses after the barrier. This is particularly useful when enforcing ordering between Normal and Strongly Ordered memory accesses. For example, consider a scenario where a store operation to Normal memory must complete before a load operation from Strongly Ordered memory:
STR R0, [R1] @ Store to Normal memory
DMB @ Ensure the store completes before proceeding
LDR R2, [R3] @ Load from Strongly Ordered memory
In this example, the DMB ensures that the store to Normal memory (STR R0, [R1]
) is visible to all observers before the load from Strongly Ordered memory (LDR R2, [R3]
) is executed. Without the DMB, the processor might reorder these operations, leading to incorrect behavior.
The Data Synchronization Barrier (DSB) is a stronger barrier that ensures all memory accesses, as well as any associated cache maintenance operations, complete before continuing. This is useful when transitioning between memory types or when performing operations that require strict ordering, such as configuring peripherals or enabling interrupts. For example:
STR R0, [R1] @ Store to Normal memory
DSB @ Ensure all memory accesses complete
LDR R2, [R3] @ Load from Strongly Ordered memory
The DSB ensures that not only the store to Normal memory but also any pending cache operations complete before the load from Strongly Ordered memory is executed.
The Instruction Synchronization Barrier (ISB) ensures that all preceding instructions are completed before any subsequent instructions are fetched and executed. This is particularly important when modifying control registers or switching memory mappings, as it ensures that the processor’s pipeline is flushed and all previous instructions are fully executed.
Practical Considerations for Memory Barrier Usage
When implementing memory barriers, it is essential to consider the specific requirements of the system and the memory types involved. Overusing barriers can lead to performance degradation, as they force the processor to wait for all preceding operations to complete. However, omitting necessary barriers can result in subtle and difficult-to-debug issues, particularly in multi-core systems or systems with complex memory hierarchies.
For example, in a multi-core system, memory barriers are critical for ensuring that updates to shared data structures are visible to all cores in the correct order. Consider a scenario where Core A updates a shared variable in Normal memory and Core B reads the updated value from Strongly Ordered memory:
@ Core A
STR R0, [R1] @ Update shared variable in Normal memory
DMB @ Ensure the update is visible to other cores
@ Core B
DMB @ Ensure previous loads are completed
LDR R2, [R3] @ Read shared variable from Strongly Ordered memory
In this example, the DMB on Core A ensures that the store to Normal memory is visible to Core B before it reads the value from Strongly Ordered memory. The DMB on Core B ensures that any previous loads are completed before the read operation, preventing potential reordering issues.
Summary of Key Points
- ARMv7 architectures categorize memory into Normal, Device, and Strongly Ordered types, each with distinct access ordering rules.
- Strongly Ordered memory enforces strict program order for accesses, but this does not extend to interactions with Normal memory.
- Memory barriers (DMB, DSB, ISB) are essential for enforcing correct ordering when switching between memory types.
- Overusing barriers can degrade performance, while omitting necessary barriers can lead to functional issues.
- Practical usage of barriers requires careful consideration of system requirements and memory access patterns.
By understanding the nuances of ARMv7 memory ordering and the role of memory barriers, developers can ensure correct and efficient system behavior, particularly in complex embedded systems with mixed memory types.