ARMv6-M MOV Instruction T1 Encoding Behavior and Documentation Ambiguity
The ARMv6-M architecture, a subset of the ARMv6 architecture designed for microcontrollers, introduces a specific encoding for the MOV instruction known as T1 encoding. This encoding is documented in the ARMv6-M Architecture Reference Manual (ARM DDI 0419D) under section A6.7.40. The documentation states: "ARMv6-M, ARMv7-M, if and both from R0-R7. Otherwise all versions of the Thumb instruction set." This statement has led to confusion regarding the interpretation of the T1 encoding’s applicability and behavior, particularly when dealing with low registers (R0-R7) versus high registers (R8-R15).
The core issue revolves around the interpretation of the T1 encoding’s constraints and its behavior across different ARM architectures. Specifically, the confusion arises from whether the T1 encoding can only be used with low registers (R0-R7) on ARMv6-M and ARMv7-M, or if it can also be used with high registers. Additionally, there is ambiguity in how the assembler should generate the T1 encoding when dealing with low registers, especially in the context of flag-setting versus non-flag-setting operations.
The T1 encoding is a 16-bit Thumb instruction that allows for efficient data movement between registers. However, the behavior of this encoding varies depending on the architecture version and the registers involved. The documentation suggests that the T1 encoding is only valid for ARMv6-M and ARMv7-M when both the destination register (Rd) and the source register (Rm) are low registers. This interpretation, however, conflicts with the practical observation that the T1 encoding can be used with high registers on ARMv6-M.
Misinterpretation of T1 Encoding Constraints and Assembler Behavior
The confusion stems from a misinterpretation of the documentation, which implies that the T1 encoding is restricted to low registers on ARMv6-M and ARMv7-M. However, this is not the case. The T1 encoding can indeed be used with high registers on ARMv6-M, but the behavior of the encoding changes depending on whether the registers are low or high. The key point of contention is the generation of the T1 encoding when both Rd and Rm are low registers.
For architectures prior to ARMv6 (e.g., ARMv4T, ARMv5T), the use of the T1 encoding with low registers is declared unpredictable. This means that if a CPU adhering to an architecture older than ARMv6 encounters a T1 encoding with low registers, the results are undefined. This unpredictability is documented in the ARM Architecture Reference Manual (ARM DDI 0100I), which states that the T1 encoding with low registers should not be used on architectures older than ARMv6.
On ARMv6-M and ARMv7-M, the T1 encoding with low registers is valid, but the assembler must handle it differently. Specifically, when assembling a MOV instruction with low registers, the assembler should generate a flag-setting copy by emitting the encoding for "adds Rd, Rm, #0" instead of the non-flag-setting T1 encoding. This behavior ensures compatibility with the architecture’s requirements and avoids unpredictable results.
The mnemonic "cpy" is introduced in ARMv6-M to explicitly generate the non-flag-setting T1 encoding when both Rd and Rm are low registers. This mnemonic is not available in architectures older than ARMv6, which further complicates the assembler’s behavior when targeting different ARM architectures.
Correct Usage of T1 Encoding and Assembler Implementation Guidelines
To correctly use the T1 encoding and ensure proper assembler behavior, the following guidelines should be followed:
-
Understanding the T1 Encoding’s Applicability: The T1 encoding can be used with both low and high registers on ARMv6-M and ARMv7-M. However, when both Rd and Rm are low registers, the assembler must generate a flag-setting copy using the "adds Rd, Rm, #0" encoding. This ensures that the behavior is consistent with the architecture’s requirements and avoids unpredictable results.
-
Using the "cpy" Mnemonic for Non-Flag-Setting Copies: On ARMv6-M and ARMv7-M, the "cpy" mnemonic should be used to explicitly generate the non-flag-setting T1 encoding when both Rd and Rm are low registers. This mnemonic is not available in older architectures, so it should only be used when targeting ARMv6-M or later.
-
Handling Different Architectures: When assembling code for architectures older than ARMv6, the assembler must avoid generating the T1 encoding with low registers. Instead, it should always generate the flag-setting "adds Rd, Rm, #0" encoding for MOV instructions involving low registers. This ensures that the code remains predictable and compatible with the target architecture.
-
Toolchain Behavior: The ARM GCC toolchain adheres to these guidelines by always generating the flag-setting "adds Rd, Rm, #0" encoding for MOV instructions involving low registers, regardless of the target architecture. This behavior ensures compatibility across different ARM architectures and avoids unpredictable results.
-
Documentation Clarity: The documentation should be updated to clearly state that the T1 encoding can be used with high registers on ARMv6-M and ARMv7-M, but that the assembler must handle low registers differently. This clarification would help avoid confusion and ensure that developers understand the correct usage of the T1 encoding.
By following these guidelines, developers can ensure that their code is compatible with different ARM architectures and that the T1 encoding is used correctly. The key takeaway is that the T1 encoding is not restricted to low registers on ARMv6-M and ARMv7-M, but that the assembler must handle low registers differently to avoid unpredictable results.
Summary of T1 Encoding Behavior Across Architectures
Architecture | T1 Encoding with Low Registers | T1 Encoding with High Registers | Assembler Behavior for Low Registers |
---|---|---|---|
ARMv4T | Unpredictable | Valid | Generates "adds Rd, Rm, #0" |
ARMv5T | Unpredictable | Valid | Generates "adds Rd, Rm, #0" |
ARMv6-M | Valid (use "cpy" for non-flag-setting) | Valid | Generates "adds Rd, Rm, #0" for MOV, "cpy" for non-flag-setting |
ARMv7-M | Valid (use "cpy" for non-flag-setting) | Valid | Generates "adds Rd, Rm, #0" for MOV, "cpy" for non-flag-setting |
This table summarizes the behavior of the T1 encoding across different ARM architectures and provides guidance on how the assembler should handle low registers. By adhering to these guidelines, developers can ensure that their code is compatible with the target architecture and that the T1 encoding is used correctly.
In conclusion, the T1 encoding of the MOV instruction on ARMv6-M is a powerful tool for efficient data movement, but its usage requires careful consideration of the target architecture and the registers involved. By understanding the encoding’s constraints and following the correct assembler behavior, developers can avoid unpredictable results and ensure that their code is both efficient and compatible across different ARM architectures.