ARM Trusted Firmware Boot Sequence Failure During BL2 to BL31 Transition
The issue at hand involves a failure during the boot sequence of the ARM Trusted Firmware (ATF) on an Intel Agilex board. The boot process is designed to transition from BL2 (Boot Loader stage 2) to BL31 (EL3 runtime firmware) and then to Linux, bypassing U-Boot or UEFI. However, the boot sequence fails during the transition from BL2 to BL31, as indicated by the UART log provided. The log shows a series of errors, including ECC scrubbing errors, S2F bridge enable timeouts, and an assertion failure in the bl31_plat_setup.c
file at line 53. The failure occurs despite following the official ATF documentation for booting a preloaded kernel image on a base FVP (Fixed Virtual Platform).
The key symptoms observed in the UART log include:
- DDR Calibration Success: The DRAM calibration is successful, indicating that the memory initialization is correct.
- ECC Scrubbing Errors: The log shows "INFO: Scrubbing ECC" followed by "INFO: Error in response: 1f0002ff," suggesting an issue with ECC (Error-Correcting Code) scrubbing.
- S2F Bridge Enable Timeout: The log reports "ERROR: S2F bridge enable: Timeout waiting for idle ack," indicating a failure in enabling the S2F (Soft-to-Firmware) bridge.
- Assertion Failure: The boot process fails with an assertion error in
bl31_plat_setup.c
at line 53, which is likely related to platform-specific initialization.
The root cause of this issue is traced to the use of an Agilex-specific ATF source code that does not support the Linux boot process. The Agilex ATF source lacks the necessary configurations and support for the RESET_TO_BL31
and ARM_LINUX_AS_BL33
build options, which are required for booting Linux directly from BL31.
ECC Scrubbing Errors and S2F Bridge Enable Timeout
The ECC scrubbing errors and S2F bridge enable timeout are critical indicators of underlying issues in the boot process. ECC scrubbing is a process that corrects single-bit errors in memory, ensuring data integrity. The presence of ECC errors suggests that there may be issues with memory initialization or configuration. The S2F bridge enable timeout indicates a failure in the communication between the software and firmware layers, which is essential for the proper functioning of the boot sequence.
The ECC scrubbing errors could be caused by several factors:
- Incorrect ECC Configuration: The ECC settings in the memory controller may not be properly configured, leading to errors during scrubbing.
- Memory Initialization Issues: The memory initialization process may not have completed successfully, causing ECC errors during scrubbing.
- Hardware Faults: There could be a hardware fault in the memory subsystem, leading to ECC errors.
The S2F bridge enable timeout could be caused by:
- Firmware Bugs: There may be bugs in the firmware that prevent the S2F bridge from being enabled correctly.
- Timing Issues: The timing requirements for enabling the S2F bridge may not be met, leading to a timeout.
- Configuration Errors: The configuration of the S2F bridge may be incorrect, preventing it from being enabled.
The assertion failure in bl31_plat_setup.c
at line 53 is likely related to platform-specific initialization. This file is responsible for setting up the platform-specific hardware and software environment for BL31. The assertion failure suggests that there is an issue with the platform initialization code, which could be due to incorrect configuration or missing support for certain features.
Resolving ECC Scrubbing Errors, S2F Bridge Timeout, and Platform Initialization Issues
To resolve the ECC scrubbing errors, S2F bridge enable timeout, and platform initialization issues, the following steps should be taken:
-
Verify ECC Configuration: Ensure that the ECC settings in the memory controller are correctly configured. This includes checking the ECC enable/disable settings, ECC scrubbing interval, and ECC error correction capabilities. The memory controller registers should be reviewed to confirm that the ECC settings match the hardware requirements.
-
Check Memory Initialization: Verify that the memory initialization process has completed successfully. This includes checking the DDR calibration results, memory timing parameters, and memory controller configuration. The UART log indicates that DDR calibration was successful, but further verification is needed to ensure that the memory is fully initialized and ready for use.
-
Debug S2F Bridge Enable: Investigate the S2F bridge enable process to identify the cause of the timeout. This includes reviewing the firmware code responsible for enabling the S2F bridge, checking the timing requirements, and verifying the configuration of the S2F bridge. Debugging tools such as JTAG or SWD can be used to step through the firmware code and identify the exact point of failure.
-
Review Platform Initialization Code: Examine the platform initialization code in
bl31_plat_setup.c
to identify the cause of the assertion failure. This includes reviewing the platform-specific hardware and software setup, checking for missing or incorrect configurations, and ensuring that all required features are supported. The assertion failure at line 53 should be analyzed to determine the specific condition that is not being met. -
Use Common ARM Platform Source Code: As identified by the user, the Agilex-specific ATF source code does not support the Linux boot process. Switching to a common ARM platform source code that supports the
RESET_TO_BL31
andARM_LINUX_AS_BL33
build options is recommended. This will ensure that the necessary configurations and support for booting Linux directly from BL31 are available. -
Enable Debugging and Logging: Enable additional debugging and logging in the ATF code to gather more information about the boot process. This includes enabling verbose logging, adding debug prints, and using hardware debugging tools to trace the execution flow. The additional information will help in identifying the root cause of the issues and verifying the fixes.
-
Test with UEFI: As a temporary workaround, test the boot process with UEFI instead of Linux. This will help in isolating the issue and determining whether the problem is specific to the Linux boot process or a more general issue with the boot sequence. If the boot process succeeds with UEFI, it indicates that the issue is related to the Linux boot configuration.
-
Consult ARM Trusted Firmware Documentation: Refer to the official ARM Trusted Firmware documentation for guidance on configuring and debugging the boot process. The documentation provides detailed information on the boot sequence, platform initialization, and debugging techniques. It also includes examples and best practices for implementing the boot process on different platforms.
-
Engage with the ARM Community: Engage with the ARM community and forums to seek assistance and share findings. The community can provide valuable insights, recommendations, and solutions based on similar experiences. Sharing the UART log, debugging information, and steps taken so far will help in getting targeted assistance.
-
Update Firmware and Tools: Ensure that the latest version of the ARM Trusted Firmware, development tools, and hardware firmware are being used. Updates may include bug fixes, performance improvements, and new features that could resolve the issues. Check for any known issues or errata related to the Agilex platform and apply the necessary patches or workarounds.
By following these steps, the ECC scrubbing errors, S2F bridge enable timeout, and platform initialization issues can be resolved, allowing for a successful boot sequence from BL2 to BL31 and then to Linux on the Intel Agilex board. The key is to systematically identify and address each issue, leveraging the available tools, documentation, and community support to ensure a reliable and efficient boot process.