ARM Cortex-A72 MMU Configuration and Level 1 Translation Fault

When enabling the Memory Management Unit (MMU) on an ARM Cortex-A72 processor, a Level 1 translation fault can occur if the translation tables are not correctly configured or if the memory regions are not properly mapped. In this scenario, the fault is indicated by the ESR_EL3 register with an error code of 0x86000005, which corresponds to an "Instruction Abort taken without a change in Exception level" and a "Translation fault at Level 1." This fault typically occurs when the MMU is enabled, and the processor attempts to access an instruction or data that is not properly mapped in the translation tables.

The Cortex-A72 uses a multi-level translation table structure to map virtual addresses to physical addresses. The translation tables are hierarchical, with Level 0 (L0), Level 1 (L1), and Level 2 (L2) tables. The configuration of these tables, along with the Translation Control Register (TCR_EL3), determines how the MMU interprets the virtual address space and performs address translation. If the tables are not correctly set up, the MMU may fail to translate an address, resulting in a translation fault.

Incorrect Translation Table Configuration and XN Bit Setting

One of the primary causes of a Level 1 translation fault is an incorrect configuration of the translation tables. In the provided scenario, the translation tables were generated using the arm64-pgtable-tool, which produced a set of tables mapping 1GB of DRAM as Normal Read-Write memory. However, the fault occurred because the code segment was not properly defined, and the Execute Never (XN) bit was incorrectly set in the page table entries.

The XN bit is a critical attribute in the page table entries that controls whether a memory region is executable. If the XN bit is set, the processor will generate an instruction abort when attempting to execute code from that memory region. In this case, the XN bit was set in the page table entries for the code segment, causing the processor to fault when it tried to execute instructions from that region.

Additionally, the translation tables were not correctly aligned with the virtual address space configuration specified in the TCR_EL3 register. The TCR_EL3 register controls the size of the virtual address space and the granularity of the translation tables. If the translation tables are not aligned with the TCR_EL3 settings, the MMU may interpret the tables incorrectly, leading to translation faults.

Debugging and Correcting Translation Table Configuration

To resolve the Level 1 translation fault, the following steps should be taken:

  1. Define a Code Segment in the Translation Tables: Ensure that the code segment is explicitly defined in the translation tables with the correct attributes. The code segment should be marked as executable by clearing the XN bit in the page table entries. For example, the code segment should be mapped with the following attributes:

    • LDR x20, =0x781 for code block descriptors.
    • LDR x21, =0x783 for code page descriptors.

    These values ensure that the XN bit is cleared, allowing the processor to execute code from the mapped memory region.

  2. Verify Translation Table Alignment with TCR_EL3: Check the TCR_EL3 register settings to ensure that the translation tables are correctly aligned with the virtual address space configuration. The TCR_EL3 register should be configured to match the granularity and size of the translation tables. For example, if the translation tables are configured for a 4KB granularity and a 32-bit address space, the TCR_EL3 register should be set accordingly:

    • LDR x1, =0x80803520 for TCR_EL3 configuration.

    This setting ensures that the MMU interprets the translation tables correctly and performs address translation as expected.

  3. Check Memory Region Mapping: Verify that all memory regions, including the code and data segments, are correctly mapped in the translation tables. Ensure that there are no gaps or unmapped regions that could cause translation faults. In the provided scenario, the address range 0x8000_0000 to 0x8fff_ffff was not mapped, leading to a Level 1 translation fault when the processor attempted to execute code from this region. Ensure that all necessary memory regions are mapped with the correct attributes.

  4. Use Data Synchronization Barriers: After modifying the translation tables or enabling the MMU, use data synchronization barriers to ensure that the changes are visible to the processor. The ISB instruction should be used after writing to system registers such as TTBR0_EL3, MAIR_EL3, and SCTLR_EL3 to ensure that the changes take effect immediately.

  5. Debugging with ESR_EL3 and FAR_EL3: When a translation fault occurs, use the ESR_EL3 and FAR_EL3 registers to diagnose the issue. The ESR_EL3 register provides information about the type of fault, while the FAR_EL3 register contains the faulting address. By examining these registers, you can determine the cause of the fault and take corrective action.

Example of Corrected Translation Table Configuration

Below is an example of how the translation tables should be configured to avoid Level 1 translation faults:

  .section .data.mmu
  .balign 2
  mmu_lock: .4byte 0          // lock to ensure only 1 CPU runs init
  #define LOCKED 1
  mmu_init: .4byte 0          // whether init has been run
  #define INITIALISED 1
  .section .text.mmu_on
  .balign 2
  .global mmu_on
  .type mmu_on, @function
mmu_on:
zero_out_tables:
  LDR   x2, =0x80000000       // address of first table
  LDR   x3, =0x3000         // combined length of all tables
  LSR   x3, x3, #5          // number of required STP instructions
  FMOV  d0, xzr           // clear q0
1:
  STP   q0, q0, [x2], #32      // zero out 4 table entries at a time
  SUBS  x3, x3, #1
  B.NE  1b
load_descriptor_templates:
  LDR   x2, =0x00000000000705    // Device block
  LDR   x3, =0x00000000000707    // Device page
  LDR   x4, =0x00000000000701    // RW data block
  LDR   x5, =0x00000000000703    // RW data page
  LDR   x20, =0x781         // code block
  LDR   x21, =0x783         // code page
program_table_0:
  LDR   x8, =0x80000000       // base address of this table
  LDR   x9, =0x40000000       // chunk size
program_table_0_entry_2:
  LDR   x10, =2           // idx
  LDR   x11, =0x80001000       // next-level table address
  ORR   x11, x11, #0x3        // next-level table descriptor
  STR   x11, [x8, x10, lsl #3]    // write entry into table
program_table_0_entry_3:
  LDR   x10, =3           // idx
  LDR   x11, =1           // number of contiguous entries
  LDR   x12, =0xc0000000       // output address of entry[idx]
1:
  ORR   x12, x12, x4         // merge output address with template
  STR   X12, [x8, x10, lsl #3]    // write entry into table
  ADD   x10, x10, #1         // prepare for next entry idx+1
  ADD   x12, x12, x9         // add chunk to address
  SUBS  x11, x11, #1         // loop as required
  B.NE  1b
program_table_1:
  LDR   x8, =0x80001000       // base address of this table
  LDR   x9, =0x200000        // chunk size
program_table_1_entry_128_to_511:
  LDR   x10, =128          // idx
  LDR   x11, =384          // number of contiguous entries
  LDR   x12, =0x90000000       // output address of entry[idx]
1:
  ORR   x12, x12, x4         // merge output address with template
  STR   X12, [x8, x10, lsl #3]    // write entry into table
  ADD   x10, x10, #1         // prepare for next entry idx+1
  ADD   x12, x12, x9         // add chunk to address
  SUBS  x11, x11, #1         // loop as required
  B.NE  1b
program_table_2:
  LDR   x8, =0x80002000       // base address of this table
  LDR   x9, =0x200000        // chunk size
program_table_2_entry_0_to_127:
  LDR   x10, =0           // idx
  LDR   x11, =128          // number of contiguous entries
  LDR   x12, =0xc0000000       // output address of entry[idx]
1:
  ORR   x12, x12, x4         // merge output address with template
  STR   X12, [x8, x10, lsl #3]    // write entry into table
  ADD   x10, x10, #1         // prepare for next entry idx+1
  ADD   x12, x12, x9         // add chunk to address
  SUBS  x11, x11, #1         // loop as required
  B.NE  1b
init_done:
  MOV   w2, #INITIALISED
  STR   w2, [x1]
end:
  LDR   x1, =0x80000000       // program ttbr0 on this CPU
  MSR   ttbr0_el3, x1
  LDR   x1, =0xff          // program mair on this CPU
  MSR   mair_el3, x1
  LDR   x1, =0x80803520       // program tcr on this CPU
  MSR   tcr_el3, x1
  ISB
  MRS   x2, tcr_el3         // verify CPU supports desired config
  CMP   x2, x1
  B.NE  .
  LDR   x1, =0x1005         // program sctlr on this CPU
  MSR   sctlr_el3, x1
  ISB                 // synchronize context on this CPU
  RET                 // done!

Summary

To avoid Level 1 translation faults when enabling the MMU on an ARM Cortex-A72 processor, ensure that the translation tables are correctly configured, the code segment is properly defined, and the XN bit is cleared for executable memory regions. Additionally, verify that the translation tables are aligned with the TCR_EL3 settings and that all necessary memory regions are mapped. By following these steps, you can successfully enable the MMU and avoid translation faults.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *