Multiprocessor Boot Failure in ARM FVP Due to Incorrect Startup Parameters
The issue at hand revolves around the inability to boot multiple processors in an ARM Fixed Virtual Platform (FVP) environment. The user initially faced difficulties in getting additional processors to execute code beyond the primary core. The problem was traced to incorrect startup parameters, specifically the pctl.startup
parameter, which controls the initialization sequence of the processors in the system. The default value of 0.0.0.*
only initializes the first core in the first cluster, leaving the remaining cores in a reset state. By modifying the parameter to 0.*.*.*
, the user was able to initialize all cores across all clusters. However, this led to a secondary issue: the L2 caches were not coherent, indicating a potential problem with the cache coherency mechanism in the multiprocessor system.
The ARM FVP is a simulation environment that emulates ARM-based SoCs, allowing developers to test and debug software and hardware designs before deploying them on physical hardware. In a multiprocessor system, proper initialization of all cores is critical for achieving symmetric multiprocessing (SMP) behavior. The pctl.startup
parameter is a key configuration setting that dictates which cores are brought out of reset and initialized during the boot process. Misconfiguration of this parameter can lead to partial or complete failure in booting the system, as observed in this case.
The secondary issue of L2 cache incoherence is a common challenge in multiprocessor systems, particularly those employing shared cache architectures. Cache coherency ensures that all processors in the system have a consistent view of memory. When caches are not coherent, different processors may observe different values for the same memory location, leading to unpredictable behavior and system crashes. The ARM architecture provides mechanisms such as the Cache Coherency Unit (CCU) and the Snoop Control Unit (SCU) to maintain coherency, but these mechanisms must be properly configured and enabled during the boot process.
Misconfigured Startup Parameters and Cache Coherency Mechanism Discrepancies
The root cause of the multiprocessor boot failure lies in the misconfiguration of the pctl.startup
parameter. The default value of 0.0.0.*
is designed to initialize only the first core in the first cluster, which is sufficient for single-core testing but inadequate for SMP systems. By changing the parameter to 0.*.*.*
, the user effectively instructed the FVP to initialize all cores across all clusters. However, this change alone does not guarantee proper cache coherency, as the cache coherency mechanisms must also be correctly configured.
The L2 cache incoherence issue arises from the improper initialization or configuration of the cache coherency mechanisms. In ARM-based systems, the L2 cache is typically shared among multiple cores within a cluster. The SCU is responsible for maintaining coherency between the L1 caches of individual cores and the shared L2 cache. If the SCU is not properly enabled or configured, the L2 cache may not be coherent, leading to data inconsistencies.
Another potential cause of the cache incoherence issue is the absence of cache maintenance operations during the boot process. ARM processors require explicit cache maintenance operations to ensure that the caches are in a known state before enabling coherency mechanisms. These operations include invalidating the caches, cleaning dirty cache lines, and enabling the SCU. If these steps are omitted or performed incorrectly, the caches may remain in an inconsistent state, leading to coherency issues.
Additionally, the boot code running on the secondary cores may not be properly synchronized with the primary core. In an SMP system, the primary core is responsible for initializing the system and bringing the secondary cores out of reset. The secondary cores must then execute their boot code, which typically involves setting up their private caches and enabling coherency mechanisms. If the boot code on the secondary cores is not properly synchronized with the primary core, the caches may not be initialized correctly, leading to incoherence.
Correcting Startup Parameters and Enabling Cache Coherency Mechanisms
To resolve the multiprocessor boot failure and ensure proper cache coherency, the following steps should be taken:
-
Correcting the Startup Parameters: The
pctl.startup
parameter should be set to0.*.*.*
to initialize all cores across all clusters. This ensures that all processors are brought out of reset and begin executing their boot code. The parameter can be specified in the FVP command line as follows:--parameter pctl.startup=0.*.*.*
This change alone will enable all cores to boot, but additional steps are required to ensure cache coherency.
-
Enabling the Snoop Control Unit (SCU): The SCU must be enabled to maintain coherency between the L1 caches of individual cores and the shared L2 cache. This is typically done in the boot code of the primary core. The following steps outline the process:
- Locate the SCU Base Address: The SCU is accessed through a memory-mapped interface. The base address of the SCU can be found in the Technical Reference Manual (TRM) for the specific ARM processor being used.
- Enable the SCU: Write to the SCU control register to enable the SCU. The exact register and bit fields can be found in the TRM. For example, on a Cortex-A9 processor, the SCU control register is located at
0x1E000000
, and the enable bit is bit 0. - Verify SCU Enablement: After enabling the SCU, it is good practice to read back the control register to verify that the SCU has been successfully enabled.
-
Performing Cache Maintenance Operations: Before enabling the SCU, the caches must be placed in a known state. This involves invalidating the caches and cleaning any dirty cache lines. The following steps outline the process:
- Invalidate the L1 Caches: Use the
DCISW
(Data Cache Invalidate by Set/Way) instruction to invalidate the L1 data cache. This ensures that the cache does not contain any stale data. - Clean the L1 Caches: Use the
DCCSW
(Data Cache Clean by Set/Way) instruction to clean any dirty cache lines. This ensures that any modified data is written back to memory. - Invalidate the L2 Cache: Use the
L2CACHE_INV
(L2 Cache Invalidate) instruction to invalidate the L2 cache. This ensures that the L2 cache does not contain any stale data.
- Invalidate the L1 Caches: Use the
-
Synchronizing Secondary Core Boot Code: The boot code running on the secondary cores must be properly synchronized with the primary core. This can be achieved using synchronization primitives such as barriers or semaphores. The following steps outline the process:
- Initialize Synchronization Variables: Before bringing the secondary cores out of reset, initialize synchronization variables in shared memory. These variables will be used to coordinate the boot process between the primary and secondary cores.
- Bring Secondary Cores Out of Reset: Use the appropriate mechanism (e.g., writing to a reset control register) to bring the secondary cores out of reset.
- Wait for Synchronization: In the boot code of the secondary cores, wait for a signal from the primary core before proceeding with cache initialization and enabling coherency mechanisms. This ensures that the secondary cores do not attempt to access shared resources before they are properly initialized.
-
Verifying Cache Coherency: After enabling the SCU and performing cache maintenance operations, it is important to verify that the caches are coherent. This can be done by running a simple test program that writes to shared memory from one core and reads from another core. If the caches are coherent, the value read should match the value written. If the caches are not coherent, the value read may be stale or incorrect.
-
Debugging Cache Coherency Issues: If cache coherency issues persist, additional debugging steps may be required. These steps include:
- Enabling Cache Coherency Debugging Features: Some ARM processors provide debugging features that can help identify cache coherency issues. For example, the Cortex-A9 processor provides a Coherency Port Interface (CPI) that can be used to monitor cache transactions.
- Analyzing Cache Behavior: Use a debugger or simulation environment to analyze the behavior of the caches. Look for discrepancies in cache line states (e.g., dirty, valid, invalid) and identify any transactions that violate coherency rules.
- Reviewing Boot Code: Carefully review the boot code to ensure that all cache maintenance operations and coherency mechanisms are correctly implemented. Pay particular attention to the order of operations and the use of synchronization primitives.
By following these steps, the multiprocessor boot failure and cache coherency issues can be resolved, ensuring that the ARM FVP environment operates correctly in an SMP configuration. Proper initialization of the startup parameters, enabling of the SCU, and careful management of cache maintenance operations are critical to achieving a stable and coherent multiprocessor system.