Undefined Instruction Exception and Data Abort During CP15 System Control Register Access

The core issue revolves around the AM3352 Cortex-A8 processor experiencing a hang-up during the execution of a specific sequence involving the CP15 system control register. The hang-up is consistently observed after a sequence of events: an undefined instruction exception related to the VFP (Vector Floating Point) unit, followed by userland process execution, and finally a data abort exception. The processor halts precisely at the mcr p15, 0, r0, c1, c0, 0 instruction, which writes to the CP15 system control register. This instruction is part of the alignment_trap macro in the Linux kernel, specifically in the arch/arm/kernel/entry-header.S file.

The trace logs from the Embedded Trace Buffer (ETB) via JTAG reveal that the hang-up occurs consistently at the same point across multiple boards, suggesting a systematic issue rather than a random hardware fault. The fact that the hang-up occurs after a VFP undefined instruction exception and a data abort exception indicates a potential issue with the interaction between the VFP, MMU, and L1/L2 caches during the execution of the mcr p15 instruction.

The problem is exacerbated when the Linux kernel is configured with HIGHMEM enabled, particularly on systems with 1GB of DRAM. In such configurations, the memory management for the HIGHMEM region (above 740MB) differs significantly from the LOWMEM region, and this difference appears to trigger the core hang-up. Disabling HIGHMEM or reducing the DRAM size to 512MB (which eliminates the HIGHMEM region) prevents the hang-up, further implicating the HIGHMEM memory management functions in the issue.

HIGHMEM Memory Management and Cortex-A8 CP15 Co-Processor Interaction

The root cause of the core hang-up appears to be a complex interaction between the Linux kernel’s HIGHMEM memory management functions and the Cortex-A8’s CP15 co-processor, particularly during the handling of the mcr p15 instruction. The Cortex-A8’s CP15 system control register is responsible for configuring critical system features such as the MMU, caches, and alignment checking. The mcr p15 instruction in question is part of the alignment_trap macro, which is used to handle alignment faults in the kernel.

When HIGHMEM is enabled, the Linux kernel employs a different memory management strategy for the HIGHMEM region compared to the LOWMEM region. This strategy involves more frequent changes to the page tables and MMU configuration, which in turn requires more frequent access to the CP15 system control register. The increased frequency of CP15 accesses, combined with the specific sequence of exceptions (VFP undefined instruction and data abort), appears to expose a latent issue in the Cortex-A8’s handling of CP15 instructions.

The issue is further compounded by the fact that the Cortex-A8 revision in use (r3p2) has specific errata related to the CP15 co-processor and cache management. While the errata ARM_ERRATA_430973, ARM_ERRATA_458693, and ARM_ERRATA_460075 are not directly applicable to this revision, they highlight the potential for subtle issues in the CP15 and cache interaction. The trace logs suggest that the problem may be related to the MMU and L1/L2 cache states during the execution of the mcr p15 instruction, particularly when the system is under the stress of managing HIGHMEM.

Mitigating CP15-Related Core Hang-Ups Through Kernel Configuration and Cache Management

To address the core hang-up issue, several steps can be taken to mitigate the interaction between HIGHMEM memory management and the Cortex-A8’s CP15 co-processor. The following troubleshooting steps and solutions are recommended:

  1. Disable HIGHMEM in the Kernel Configuration: Since the issue is consistently observed when HIGHMEM is enabled, the most straightforward solution is to disable HIGHMEM in the kernel configuration. This can be done by setting CONFIG_HIGHMEM=n in the kernel configuration file. This approach eliminates the use of the HIGHMEM region, thereby avoiding the problematic memory management interactions. However, this solution may not be feasible for systems that require more than 740MB of usable memory.

  2. Modify the alignment_trap Macro to Avoid Unnecessary CP15 Writes: The alignment_trap macro in the Linux kernel writes to the CP15 system control register even when the value being written is the same as the current value. This can be optimized by modifying the macro to only write to the CP15 register when the value changes. This reduces the frequency of CP15 accesses, potentially mitigating the issue. The following patch can be applied to the kernel:

    diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
    index 1234567..89abcdef 100644
    --- a/arch/arm/kernel/entry-header.S
    +++ b/arch/arm/kernel/entry-header.S
    @@ -40,7 +40,9 @@
     40 .macro alignment_trap, rtemp
     41 #ifdef CONFIG_ALIGNMENT_TRAP
     42 ldr \rtemp, .LCcralign
     43 ldr \rtemp, [\rtemp]
    - 44 mcr p15, 0, \rtemp, c1, c0
    + 44 mrc p15, 0, \rtemp, c1, c0
    + 45 cmp \rtemp, \rtemp
    + 46 mcrne p15, 0, \rtemp, c1, c0
     45 #endif
     46 .endm
    

    This patch reads the current value of the CP15 system control register before writing to it, and only performs the write if the new value differs from the current value.

  3. Enable Cortex-A8 Errata Workarounds: Although the specific errata ARM_ERRATA_430973, ARM_ERRATA_458693, and ARM_ERRATA_460075 are not directly applicable to the r3p2 revision, enabling these workarounds may still provide some benefit by altering the behavior of the CP15 co-processor and cache management. This can be done by setting the appropriate configuration options in the kernel:

    CONFIG_ARM_ERRATA_430973=y
    CONFIG_ARM_ERRATA_458693=y
    CONFIG_ARM_ERRATA_460075=y
    
  4. Upgrade to a Newer Kernel Version: The issue has been observed on Linux kernel version 3.13.4. Upgrading to a newer kernel version, such as the ti-linux-4.9.y branch, may resolve the issue, as newer kernels may include fixes or optimizations related to CP15 handling and HIGHMEM management.

  5. Test on a Different Board with the Same Processor: If possible, testing the issue on a different board with the same Cortex-A8 processor (e.g., the BeagleBone Black, which uses the TI AM3358 processor) can help determine whether the issue is specific to the hardware configuration of the current board or a more general problem with the Cortex-A8 processor.

  6. Perform Extensive Memory Testing: Although memory testing with memtester has not revealed any issues, it is still advisable to perform more extensive memory testing, particularly focusing on the HIGHMEM region. This can help rule out any subtle memory-related issues that may be contributing to the core hang-up.

By implementing these steps, it should be possible to mitigate or resolve the core hang-up issue related to the interaction between HIGHMEM memory management and the Cortex-A8’s CP15 co-processor. The key is to reduce the frequency of CP15 accesses and optimize the handling of the CP15 system control register, particularly in the context of HIGHMEM management.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *