ARM TrustZone World Switching and Cache Coherency Issues
ARM TrustZone technology provides a secure execution environment by partitioning the system into Secure and Normal (Non-secure) worlds. This partitioning extends to memory, peripherals, and even CPU states. However, one of the most critical challenges in TrustZone implementations is maintaining cache coherency during world switches. When the system transitions between Secure and Non-secure worlds, the cache state must be carefully managed to prevent data leakage, corruption, or unintended side effects.
In most ARM-based System-on-Chips (SoCs), the L1 cache typically employs a write-back policy. This means that data modifications are initially written to the cache and only later flushed to main memory. While this approach improves performance by reducing memory access latency, it introduces complexity in TrustZone implementations. During a world switch, the cache may contain sensitive data from the Secure world, which must not be accessible to the Non-secure world. Conversely, the Non-secure world’s cache lines must not interfere with the Secure world’s execution.
The core issue arises from the need to ensure that all cached data is properly synchronized between the two worlds without causing significant performance degradation. If the entire L1 cache is invalidated or flushed during every world switch, the system’s performance will suffer due to the increased latency of reloading cache lines. This is particularly problematic in real-time systems where deterministic performance is critical.
Write-Back Cache Policy and Memory Barrier Omissions
The primary cause of cache coherency issues in TrustZone implementations is the write-back cache policy combined with potential omissions of memory barriers. In a write-back cache, data is written to the cache first and only written to main memory when the cache line is evicted or explicitly flushed. This behavior can lead to stale data in the cache, which becomes problematic during world switches.
When the system transitions from the Secure world to the Non-secure world, any dirty cache lines (modified but not yet written to main memory) in the Secure world must be flushed to ensure that the Non-secure world does not access stale or sensitive data. Similarly, when transitioning back to the Secure world, the Non-secure world’s cache lines must be invalidated to prevent the Secure world from accessing non-secure data.
Memory barriers play a crucial role in ensuring that cache operations are performed in the correct order. However, omitting memory barriers can lead to race conditions where cache operations are not properly synchronized with world switches. This can result in data corruption or security vulnerabilities. For example, if a memory barrier is not placed before a world switch, the cache may still contain dirty lines that should have been flushed, leading to potential data leakage.
Another contributing factor is the lack of hardware-enforced cache partitioning. While TrustZone provides memory partitioning at the bus level, the cache is typically shared between the Secure and Non-secure worlds. This shared cache architecture necessitates software-based cache management, which can be error-prone and inefficient if not implemented correctly.
Implementing Cache Management and Data Synchronization Barriers
To address cache coherency issues in ARM TrustZone implementations, a combination of cache management techniques and data synchronization barriers must be employed. The goal is to ensure that cache operations are properly synchronized with world switches while minimizing performance overhead.
Cache Flushing and Invalidation
During a world switch, the L1 data cache should be flushed to ensure that all dirty lines are written back to main memory. This can be achieved using the DCISW
(Data Cache Invalidate by Set/Way) and DCCSW
(Data Cache Clean by Set/Way) instructions. The DCCSW
instruction cleans (flushes) dirty cache lines to main memory, while the DCISW
instruction invalidates cache lines, ensuring that they are not reused in the new world.
For the instruction cache, the ICIMVAU
(Instruction Cache Invalidate by Modified Virtual Address to PoU) instruction can be used to invalidate specific cache lines. This is particularly important when transitioning from the Non-secure world to the Secure world, as the Secure world must not execute potentially malicious code from the Non-secure world.
Data Synchronization Barriers
Data synchronization barriers are essential to ensure that cache operations are completed before proceeding with a world switch. The DSB
(Data Synchronization Barrier) instruction ensures that all memory accesses prior to the barrier are completed before any subsequent instructions are executed. This is critical to prevent race conditions where cache operations are not properly synchronized with world switches.
For example, before transitioning from the Secure world to the Non-secure world, a DSB
instruction should be executed after flushing the cache to ensure that all dirty lines are written to main memory. Similarly, a DSB
instruction should be executed after invalidating the cache when transitioning back to the Secure world.
Selective Cache Management
To minimize performance overhead, selective cache management techniques can be employed. Instead of flushing or invalidating the entire cache, only the cache lines that contain sensitive data should be managed. This can be achieved using the DCIMVAC
(Data Cache Invalidate by Modified Virtual Address to PoC) and DCCMVAC
(Data Cache Clean by Modified Virtual Address to PoC) instructions, which operate on specific virtual addresses.
By selectively managing cache lines, the system can avoid the performance penalty associated with full cache flushes and invalidations. This approach is particularly useful in systems where only a small portion of the cache contains sensitive data.
Trusted Firmware Implementation
ARM Trusted Firmware (ATF) provides a reference implementation of cache management and world switching for TrustZone-enabled systems. The firmware includes routines for flushing and invalidating caches, as well as inserting data synchronization barriers at the appropriate points in the world switch process.
The ATF implementation ensures that cache operations are properly synchronized with world switches, preventing data leakage and corruption. Additionally, the firmware provides hooks for platform-specific cache management routines, allowing for customization based on the specific requirements of the SoC.
Performance Considerations
While cache management is essential for maintaining security in TrustZone implementations, it is important to consider the performance impact of these operations. Frequent cache flushes and invalidations can significantly degrade system performance, particularly in real-time systems where deterministic behavior is critical.
To mitigate performance overhead, the following strategies can be employed:
- Batch Cache Operations: Instead of performing cache operations on every world switch, batch multiple operations together to reduce the overall number of cache flushes and invalidations.
- Cache Locking: Lock critical cache lines in the Secure world to prevent them from being evicted or invalidated during world switches. This can reduce the need for frequent cache management operations.
- Hardware-Assisted Cache Partitioning: Some ARM SoCs provide hardware support for cache partitioning, allowing the Secure and Non-secure worlds to have dedicated cache regions. This can reduce the need for software-based cache management and improve performance.
Example Implementation
The following example demonstrates how to implement cache management and data synchronization barriers in a TrustZone-enabled system:
; Transition from Secure world to Non-secure world
DSB ; Ensure all previous memory accesses are completed
DCCSW ; Clean (flush) the entire L1 data cache
DSB ; Ensure cache clean operation is completed
ISB ; Ensure the instruction stream is synchronized
SMC #0 ; Perform the world switch
; Transition from Non-secure world to Secure world
DSB ; Ensure all previous memory accesses are completed
DCISW ; Invalidate the entire L1 data cache
DSB ; Ensure cache invalidate operation is completed
ISB ; Ensure the instruction stream is synchronized
SMC #1 ; Perform the world switch
In this example, the DSB
instruction is used to ensure that all memory accesses are completed before proceeding with the cache operation. The DCCSW
and DCISW
instructions are used to clean and invalidate the L1 data cache, respectively. The ISB
instruction ensures that the instruction stream is synchronized before performing the world switch using the SMC
(Secure Monitor Call) instruction.
Conclusion
Maintaining cache coherency in ARM TrustZone implementations is a complex but essential task. By understanding the underlying causes of cache coherency issues and employing a combination of cache management techniques and data synchronization barriers, developers can ensure that their TrustZone-enabled systems are both secure and performant. ARM Trusted Firmware provides a robust reference implementation that can be customized to meet the specific requirements of different SoCs, making it an invaluable resource for developers working with TrustZone technology.
Through careful design and optimization, it is possible to achieve a balance between security and performance, ensuring that TrustZone-enabled systems can meet the demands of modern embedded applications.