ARM Core PMU and DSU_PMU: Differences, Interactions, and Best Practices

ARM Core PMU and DSU_PMU: Differences, Interactions, and Best Practices

ARM Core PMU and DSU_PMU: Separate Hardware Components with Distinct Roles The ARM Core Performance Monitoring Unit (Core PMU) and the DynamIQ Shared Unit Performance Monitoring Unit (DSU_PMU) are two distinct hardware components designed to monitor different aspects of system performance. The Core PMU is integrated within each ARM Cortex core and is responsible for…

ARM Cortex-M7 vs Cortex-M85 FP32 Multiply-Add Throughput Analysis and Optimization

ARM Cortex-M7 vs Cortex-M85 FP32 Multiply-Add Throughput Analysis and Optimization

ARM Cortex-M7 and Cortex-M85 FP32 Multiply-Add Throughput Discrepancies The ARM Cortex-M7 and Cortex-M85 processors are both high-performance microcontrollers designed for embedded systems, but they exhibit significant differences in their floating-point (FP32) multiply-add throughput. The Cortex-M7, while capable, shows a throughput of approximately 5-6 clock cycles per fused multiply-add (FMA) operation when utilizing ARM libraries. This…

Optimizing NEON Performance on ARM Cortex-A35: Data Loading, Parallel Execution, and Cache Management

Optimizing NEON Performance on ARM Cortex-A35: Data Loading, Parallel Execution, and Cache Management

NEON Data Loading Strategies for Cortex-A35’s In-Order Execution The ARM Cortex-A35 is a power-efficient processor with in-order execution, which means instructions are executed in the order they are fetched, without dynamic reordering. This characteristic has significant implications for utilizing the NEON SIMD (Single Instruction, Multiple Data) engine efficiently. NEON is designed to accelerate multimedia and…

Missing MBIF File for ARM Cortex-R5F MBIST Implementation

Missing MBIF File for ARM Cortex-R5F MBIST Implementation

ARM Cortex-R5F MBIST Configuration Challenges in SoC Designs The ARM Cortex-R5F is a widely used real-time processor core in System-on-Chip (SoC) designs, particularly in applications requiring high reliability and deterministic performance. One critical aspect of ensuring reliability in such designs is the implementation of Memory Built-In Self-Test (MBIST) mechanisms. MBIST is essential for detecting memory…

Secure and Non-Secure World Integration in TF-M on STM32U5

Secure and Non-Secure World Integration in TF-M on STM32U5

Accessing Secure Modules from Non-Secure World in TF-M The integration of secure and non-secure worlds in Trusted Firmware-M (TF-M) on the STM32U5 microcontroller involves leveraging Arm TrustZone technology, which is designed to provide hardware-enforced isolation between secure and non-secure states. The STM32U5 microcontroller, based on the Arm Cortex-M33 processor, implements TrustZone to partition resources such…

AXI Burst Transactions: Understanding ARLEN, ARSIZE, WSTRB, and Protocol Compliance

AXI Burst Transactions: Understanding ARLEN, ARSIZE, WSTRB, and Protocol Compliance

AXI Burst Transaction Requirements and Response Handling The Advanced eXtensible Interface (AXI) protocol is a critical component in modern System-on-Chip (SoC) designs, particularly when dealing with ARM-based architectures. AXI defines a high-performance, high-frequency interface between managers (initiators) and subordinates (targets). One of the most complex aspects of AXI is its handling of burst transactions, where…

ARM Cortex-A76 STP Instruction Latency Anomalies and Optimization

ARM Cortex-A76 STP Instruction Latency Anomalies and Optimization

ARM Cortex-A76 STP Instruction Latency Discrepancy The ARM Cortex-A76 processor, a high-performance CPU core designed for mobile and embedded applications, exhibits an unexpected latency anomaly in the Store Pair (STP) instruction when benchmarked using the MegPeak tool. The observed latency for the STP instruction is significantly higher than the values documented in the ARM Cortex-A76…

ARMv8-A CurrentEL Register Value Retrieval and Debugging on Android

ARMv8-A CurrentEL Register Value Retrieval and Debugging on Android

ARMv8-A CurrentEL Register Access and Exception Level Detection The ARMv8-A architecture introduces a hierarchical exception model with four distinct exception levels (EL0 to EL3), each serving a specific purpose in the system’s security and privilege model. The CurrentEL register is a system register that holds the current exception level of the executing code. Retrieving the…

ARM Cortex-A53 Core Lockup Due to Device-GRE Memory Attributes and AXI Interconnect Desynchronization

ARM Cortex-A53 Core Lockup Due to Device-GRE Memory Attributes and AXI Interconnect Desynchronization

ARM Cortex-A53 Core Lockup During FPGA Memory Writes with Device-GRE Attributes The ARM Cortex-A53 core, when configured to use Device-GRE (Gathering, Reordering, Early Write Acknowledgment) memory attributes for FPGA memory accesses, can experience core lockups and interconnect desynchronization. This issue arises when the A53 core attempts to combine multiple writes into larger AXI transactions to…

RAS Error Injection and Containment Issues on Cortex-A with FEAT_RASv1p1

RAS Error Injection and Containment Issues on Cortex-A with FEAT_RASv1p1

ARM Cortex-A RAS Error Injection: SError Exception Not Triggering for CE/DE Errors The ARM Cortex-A architecture, particularly when utilizing the FEAT_RASv1p1 (Reliability, Availability, and Serviceability) extension, provides mechanisms for error injection and containment. However, a common issue arises when attempting to inject Corrected Errors (CE) and Deferred Errors (DE) using the Pseudo-fault Generation Control Register…