EL2 Memory Corruption During Secure to Non-Secure Memory Access
The core issue revolves around memory corruption observed when attempting to read a memory block at address 0x80280000
from Exception Level 2 (EL2) after the memory was initially configured and loaded by a bootloader running at Exception Level 3 (EL3). The memory corruption manifests as random corrupted bytes, with the size and location of the corruption varying across reboots. This issue is particularly problematic in bare-metal applications where precise control over memory access and translation is critical. The root cause appears to be related to inconsistencies in memory translation and cache coherency between EL3 and EL2, compounded by misconfigurations in the Memory Management Unit (MMU) and Translation Lookaside Buffer (TLB) settings.
The memory translation tables at EL3 and EL2 are configured to map the same physical memory region, but the access permissions and cacheability attributes differ between the two exception levels. This discrepancy can lead to incoherent memory views, especially when the MMU and caches are enabled at both levels. Additionally, the use of secure and non-secure memory spaces further complicates the scenario, as the NS (Non-Secure) bit in the translation descriptors must be carefully managed to ensure proper access across exception levels.
Misconfigured MMU Descriptors and Cache Coherency Issues
The primary cause of the memory corruption lies in the misconfiguration of the MMU descriptors and cache coherency mechanisms between EL3 and EL2. At EL3, the memory region is marked as Normal Write-Back Cacheable, with the MMU and caches enabled. However, at EL2, the same memory region is accessed without ensuring that the cacheability attributes and translation descriptors are consistent with those at EL3. This inconsistency can lead to scenarios where the processor accesses stale or incoherent data from the cache, resulting in memory corruption.
The translation tables at EL3 and EL2 are configured as follows:
-
EL3 Translation Tables:
- L0 Table (Base Address:
0x8000C000
):0x8000C000: 0x000000008000D003 0x0000000000000000 0x8000C010: 0x0000000000000000 0x0000000000000000
- L1 Table (Base Address:
0x8000D000
):0x8000D000: 0x000000008000E003 0x000000008000F003 0x8000D010: 0x0000000080000611 0x00000000C0000611 0x8000D020: 0x0000000000000000 0x0000000000000000
- L0 Table (Base Address:
-
EL2 Translation Tables:
- L0 Table (Base Address:
0x8000C000
):0x8000C000: 0x000000008000D003 0x00000054FFFF8188
- L1 Table (Base Address:
0x8000D000
):0x8000D000: 0x000000008000E003 0x000000008000F003 0x8000D010: 0x0000000080000611 0x00000000C0000611
- L0 Table (Base Address:
The key issue is that the NS bit (bit 63 in the descriptor) and the cacheability attributes are not consistently configured between EL3 and EL2. This leads to a situation where the memory region is treated as secure at EL3 but non-secure at EL2, causing incoherency in the memory view. Additionally, the cache invalidation and TLB maintenance operations are not performed correctly, further exacerbating the issue.
Implementing Consistent MMU Configuration and Cache Management
To resolve the memory corruption issue, the following steps must be taken to ensure consistent MMU configuration and proper cache management between EL3 and EL2:
-
Ensure Consistent Cacheability Attributes:
The cacheability attributes for the memory region must be identical at both EL3 and EL2. This includes configuring the MAIR (Memory Attribute Indirection Register) and the translation descriptors to use the same memory type (e.g., Normal Write-Back Cacheable). The MAIR values at EL3 and EL2 should be set as follows:MAIR_EL3 = 0x000000FF440C0400 MAIR_EL2 = 0x000000FF440C0400
-
Configure Translation Descriptors with Correct NS Bit:
The NS bit in the translation descriptors must be set consistently across EL3 and EL2. If the memory region is intended to be accessed as non-secure at EL2, the NS bit must be set in the descriptors at both levels. For example, the L1 descriptor at EL3 should be modified as follows:0x8000D010: 0x0000000080000611 0x0000000080000631
This ensures that the memory region is treated as non-secure at both EL3 and EL2.
-
Perform Proper TLB Invalidation and Cache Maintenance:
Before enabling the MMU at EL2, the TLB must be invalidated to ensure that any stale entries are removed. The following sequence should be used:DSB ISHST TLBI ALLE2 DSB ISH ISB
Additionally, if the memory region is cached, the cache must be invalidated to ensure that the processor accesses the correct data from memory. This can be done using the DC IVAC (Data Cache Invalidate by Virtual Address to Point of Coherency) instruction.
-
Enable MMU with Consistent SCTLR Settings:
The SCTLR (System Control Register) settings at EL2 must be consistent with those at EL3. Specifically, the WXN (Write Execute Never) bit should be configured to match the access permissions in the translation descriptors. For example:SCTLR_EL2 = 0x30C5183D
This ensures that the memory region is treated as writeable and executable if the descriptors permit it.
-
Use a Secure to Non-Secure Memory Copy Workaround:
If the above steps do not resolve the issue, a workaround can be implemented by copying the memory block from secure to non-secure space at EL3 before accessing it at EL2. This involves modifying the EL3 translation tables to map the same physical memory region to both secure and non-secure virtual addresses. For example:0x8000D010: 0x0000000080000611 0x0000000080000631
This allows the memory block to be copied from the secure virtual address (
0x80000000
) to the non-secure virtual address (0xC0000000
) at EL3. Once the copy is complete, the MMU at EL2 can be enabled with the translation tables configured to access the non-secure virtual address (0x80000000
).
By following these steps, the memory corruption issue can be resolved, ensuring that the memory block is accessed correctly at EL2 without any incoherency or corruption. The key is to maintain consistency in the MMU configuration and cache management between EL3 and EL2, while also ensuring proper TLB and cache maintenance operations are performed.