GlobalISel: Non-Commutative Tablegen Combines Explained

by SLV Team 56 views
GlobalISel: Non-Commutative Tablegen Combines Explained

Have you ever wondered why certain code optimizations in LLVM's GlobalISel don't always work as expected when you change the order of operations? Specifically, why Tablegen combines sometimes fail to produce the same result when operands are swapped in seemingly equivalent expressions? This article dives deep into a fascinating issue within LLVM's GlobalISel framework, focusing on the non-commutative nature of Tablegen combines. Let's break down the problem, explore the underlying concepts, and understand why this behavior occurs. We'll use a real-world example to illustrate the issue and discuss the implications for compiler optimization.

Understanding the Problem

The core issue revolves around the non-commutative behavior observed in GlobalISel Tablegen combines. In simpler terms, this means that the order in which operations are performed can affect the outcome of the optimization process. This is particularly evident when dealing with arithmetic operations like addition (G_ADD) and subtraction (G_SUB). To illustrate, consider the following scenario: we have an expression of the form (A + (B - (A + C))). Our goal is to simplify this expression to (B - C). Sounds straightforward, right? However, when we commute the operands in the final addition, changing the expression to ((B - (A + C)) + A), the optimization might fail. This discrepancy arises because the pattern matching rules defined in Tablegen are sensitive to the precise order and structure of the input code.

The problem stems from how GlobalISel's combiner rules are defined using Tablegen. Tablegen is a powerful tool for describing patterns and transformations in LLVM. However, the matching process is inherently order-dependent. When a combiner rule is defined, it specifies a particular pattern that the code must match exactly for the transformation to be applied. If the code deviates from this pattern, even in ways that might seem mathematically equivalent, the rule might not trigger.

The example provided in the original discussion highlights this issue perfectly. The combiner rule APlusBMinusAplusC is designed to fold (A + (B - (A + C))) into (B - C). However, this rule is structured to specifically recognize the pattern where A is added to the result of (B - (A + C)). If we reverse the order of operands in the final addition, presenting the expression as ((B - (A + C)) + A), the pattern matching fails because the rule isn't defined to recognize this alternative arrangement. This non-commutative behavior can lead to missed optimization opportunities and potentially impact the performance of the generated code.

To fully grasp the implications, it's essential to understand how GlobalISel operates. GlobalISel is LLVM's next-generation instruction selection framework, aiming to provide a more flexible and target-independent approach to code generation. It relies heavily on combiners to simplify and optimize the intermediate representation (IR) of the code before generating machine code. These combiners work by matching specific patterns in the IR and applying predefined transformations. The effectiveness of these combiners directly affects the quality of the generated code, making it crucial to address issues like non-commutativity.

Diving into the Code Example

Let's dissect the provided LLVM IR code snippets to understand the practical implications of this non-commutative behavior. We have two test cases, test and test_rev, which are subtly different but highlight the core issue.

Test Case: test

---
name: test
body: |
  bb.1:
    liveins: $vgpr0, $vgpr1, $vgpr2

    %A:_(s32) = COPY $vgpr0
    %B:_(s32) = COPY $vgpr1
    %C:_(s32) = COPY $vgpr2
    %add1:_(s32) = G_ADD %A, %C
    %sub1:_(s32) = G_SUB %B, %add1
    %root:_(s32) = G_ADD %A, %sub1
    $vgpr0 = COPY %root(s32)
    SI_RETURN implicit $vgpr0
...

In the test case, the code performs the operations in the order expected by the APlusBMinusAplusC combiner rule. It first calculates %add1 as A + C, then %sub1 as B - %add1, and finally %root as A + %sub1. This arrangement perfectly matches the pattern defined in the Tablegen rule, allowing the combiner to successfully transform the expression into B - C.

Test Case: test_rev

---
name: test_rev
body: |
  bb.1:
    liveins: $vgpr0, $vgpr1, $vgpr2

    %A:_(s32) = COPY $vgpr0
    %B:_(s32) = COPY $vgpr1
    %C:_(s32) = COPY $vgpr2
    %add1:_(s32) = G_ADD %A, %C
    %sub1:_(s32) = G_SUB %B, %add1
    %root:_(s32) = G_ADD %sub1, %A
    $vgpr0 = COPY %root(s32)
    SI_RETURN implicit $vgpr0
...

The test_rev case introduces a subtle but crucial change. The final addition is performed as %root = G_ADD %sub1, %A, which is mathematically equivalent to A + %sub1. However, this seemingly minor alteration causes the APlusBMinusAplusC combiner rule to fail. The rule, as defined, expects the pattern where A is the first operand in the final addition, not the second. This failure highlights the non-commutative nature of the Tablegen combines and demonstrates how sensitive they are to the order of operations.

By running these test cases with the specified command (llc -march=amdgcn -mcpu=gfx1200 -o - -run-pass=amdgpu-prelegalizer-combiner test.mir), you can observe this behavior firsthand. The test case will be optimized as expected, while test_rev will not, illustrating the impact of operand order on the combiner's effectiveness.

Why Does This Happen?

The reason for this non-commutative behavior lies in the way Tablegen patterns are matched. Tablegen uses a declarative approach, where you define patterns and transformations, and the system automatically generates the necessary matching code. However, the underlying matching algorithm typically performs a structural comparison of the code against the defined patterns. This comparison is order-sensitive, meaning that it considers the precise arrangement of operations and operands.

In the case of the APlusBMinusAplusC rule, the pattern explicitly specifies the order of operands in the final G_ADD instruction. When the operands are swapped, the structural comparison fails, and the rule doesn't trigger. This is a deliberate design choice in Tablegen to provide fine-grained control over the transformations. However, it also introduces the potential for non-commutative behavior, as demonstrated by our example.

This behavior is not necessarily a bug but rather a consequence of the design. Tablegen's pattern matching is designed to be precise and predictable, allowing developers to create highly specific optimization rules. However, this precision comes at the cost of flexibility. To handle commutative operations effectively, developers need to explicitly define rules for all possible operand orders, which can lead to code duplication and increased complexity.

Implications and Solutions

The non-commutative nature of Tablegen combines has several implications for compiler optimization. First, it can lead to missed optimization opportunities, as seen in the test_rev example. If the code is not arranged in the exact order expected by the combiner rule, the optimization will not be applied, potentially resulting in less efficient code.

Second, it can make the optimization process less predictable. Developers need to be aware of the order-sensitivity of the combiner rules and ensure that the code is structured appropriately to trigger the desired transformations. This can add complexity to the development and maintenance of compiler optimization passes.

So, what are the potential solutions to this issue? Several approaches can be taken to address the non-commutative behavior of Tablegen combines:

  1. Explicitly Define Commutative Rules: The most straightforward solution is to define separate combiner rules for each possible operand order. For example, in addition to the APlusBMinusAplusC rule, we could define a APlusBMinusAplusC_Commuted rule that handles the case where the operands in the final addition are swapped. This approach ensures that the optimization is applied regardless of the operand order. However, it can lead to code duplication and make the Tablegen definitions more verbose.

  2. Use Generic Matching Techniques: Another approach is to use more generic pattern matching techniques that are less sensitive to operand order. For example, we could use wildcards or other matching mechanisms to ignore the order of operands in certain instructions. This approach can reduce code duplication but might also make the pattern matching less precise, potentially leading to unintended transformations.

  3. Introduce Canonicalization Passes: A third approach is to introduce canonicalization passes that transform the code into a standard form before applying the combiner rules. For example, a canonicalization pass could reorder the operands in commutative operations to a consistent order, ensuring that the combiner rules always see the expected pattern. This approach can improve the predictability of the optimization process but might also add overhead to the compilation process.

  4. Enhance Tablegen with Commutativity Awareness: A more advanced solution would be to enhance Tablegen itself to be aware of commutative operations. This could involve adding new features to Tablegen that allow developers to specify that an operation is commutative and automatically generate the necessary matching code for all possible operand orders. This approach would provide the best of both worlds: precise pattern matching and automatic handling of commutative operations. However, it would also require significant changes to Tablegen and the GlobalISel framework.

Conclusion

The non-commutative behavior of Tablegen combines in GlobalISel is a fascinating issue that highlights the challenges of compiler optimization. While it can lead to missed optimization opportunities, it is also a consequence of the design choices made in Tablegen to provide precise and predictable pattern matching. Understanding this behavior is crucial for developers working on LLVM's GlobalISel framework. By carefully designing combiner rules and considering the potential for non-commutativity, we can ensure that the generated code is as efficient as possible.

Guys, this exploration into GlobalISel's Tablegen combines should give you a solid understanding of why operand order matters in seemingly simple arithmetic optimizations. Remember, compiler optimization is a complex dance between mathematical equivalence and the precise patterns that the compiler is programmed to recognize. Keep this in mind as you delve deeper into the world of compilers and code optimization!