[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* #167907

LU-JOHN · 2025-11-13T16:55:29Z

Ensure SCC is not live before shrinking s_and*/s_or* instructions to s_bitset*.

Signed-off-by: John Lu <John.Lu@amd.com>

LU-JOHN · 2025-11-13T16:55:56Z

This fixes an issue exposed by #164201.

llvmbot · 2025-11-13T16:56:01Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (LU-JOHN)

Changes

Ensure SCC is not live before shrinking s_and*/s_or* instructions to s_bitset*.

Full diff: https://github.com/llvm/llvm-project/pull/167907.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp (+8-2)
(modified) llvm/test/CodeGen/AMDGPU/s_cmp_0.ll (+42-5)

diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 179ecbad5239f..490c8392de8dd 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -535,6 +535,8 @@ void SIShrinkInstructions::shrinkMadFma(MachineInstr &MI) const {
 /// XNOR (as a ^ b == ~(a ^ ~b)).
 /// \returns true if the caller should continue the machine function iterator
 bool SIShrinkInstructions::shrinkScalarLogicOp(MachineInstr &MI) const {
+  MachineBasicBlock &MBB = *MI.getParent();
+  MachineBasicBlock::iterator MBBI = MI.getIterator();
   unsigned Opc = MI.getOpcode();
   const MachineOperand *Dest = &MI.getOperand(0);
   MachineOperand *Src0 = &MI.getOperand(1);
@@ -550,7 +552,9 @@ bool SIShrinkInstructions::shrinkScalarLogicOp(MachineInstr &MI) const {
   uint32_t NewImm = 0;
 
   if (Opc == AMDGPU::S_AND_B32) {
-    if (isPowerOf2_32(~Imm)) {
+    if (isPowerOf2_32(~Imm) &&
+        MBB.computeRegisterLiveness(TRI, AMDGPU::SCC, std::next(MBBI), 2) ==
+            MachineBasicBlock::LQR_Dead) {
       NewImm = llvm::countr_one(Imm);
       Opc = AMDGPU::S_BITSET0_B32;
     } else if (AMDGPU::isInlinableLiteral32(~Imm, ST->hasInv2PiInlineImm())) {
@@ -558,7 +562,9 @@ bool SIShrinkInstructions::shrinkScalarLogicOp(MachineInstr &MI) const {
       Opc = AMDGPU::S_ANDN2_B32;
     }
   } else if (Opc == AMDGPU::S_OR_B32) {
-    if (isPowerOf2_32(Imm)) {
+    if (isPowerOf2_32(Imm) &&
+        MBB.computeRegisterLiveness(TRI, AMDGPU::SCC, std::next(MBBI), 2) ==
+            MachineBasicBlock::LQR_Dead) {
       NewImm = llvm::countr_zero(Imm);
       Opc = AMDGPU::S_BITSET1_B32;
     } else if (AMDGPU::isInlinableLiteral32(~Imm, ST->hasInv2PiInlineImm())) {
diff --git a/llvm/test/CodeGen/AMDGPU/s_cmp_0.ll b/llvm/test/CodeGen/AMDGPU/s_cmp_0.ll
index 0166d7ac7ddc2..b5228e3054f0a 100644
--- a/llvm/test/CodeGen/AMDGPU/s_cmp_0.ll
+++ b/llvm/test/CodeGen/AMDGPU/s_cmp_0.ll
@@ -153,6 +153,43 @@ define amdgpu_ps i32 @and64(i64 inreg %val0, i64 inreg %val1) {
   ret i32 %zext
 }
 
+define amdgpu_ps i32 @and32_clear_one_bit(i32 inreg %val0) {
+; CHECK-LABEL: and32_clear_one_bit:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_and_b32 s0, s0, 0x7fffffff
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use s0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_cselect_b64 s[0:1], -1, 0
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[0:1]
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    ; return to shader part epilog
+  %result = and i32 %val0, 2147483647
+  call void asm "; use $0", "s"(i32 %result)
+  %cmp = icmp ne i32 %result, 0
+  %zext = zext i1 %cmp to i32
+  ret i32 %zext
+}
+
+define amdgpu_ps i32 @and64_clear_one_bit(i64 inreg %val0) {
+; CHECK-LABEL: and64_clear_one_bit:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_bitset0_b32 s0, 31
+; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use s[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_cselect_b64 s[0:1], -1, 0
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[0:1]
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    ; return to shader part epilog
+  %result = and i64 %val0, -2147483649
+  call void asm "; use $0", "s"(i64 %result)
+  %cmp = icmp ne i64 %result, 0
+  %zext = zext i1 %cmp to i32
+  ret i32 %zext
+}
+
 define amdgpu_ps i32 @or32(i32 inreg %val0, i32 inreg %val1) {
 ; CHECK-LABEL: or32:
 ; CHECK:       ; %bb.0:
@@ -623,14 +660,14 @@ define amdgpu_ps i32 @si_pc_add_rel_offset_must_not_optimize() {
 ; CHECK-NEXT:    s_add_u32 s0, s0, __unnamed_1@rel32@lo+4
 ; CHECK-NEXT:    s_addc_u32 s1, s1, __unnamed_1@rel32@hi+12
 ; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB36_2
+; CHECK-NEXT:    s_cbranch_scc0 .LBB38_2
 ; CHECK-NEXT:  ; %bb.1: ; %endif
 ; CHECK-NEXT:    s_mov_b32 s0, 1
-; CHECK-NEXT:    s_branch .LBB36_3
-; CHECK-NEXT:  .LBB36_2: ; %if
+; CHECK-NEXT:    s_branch .LBB38_3
+; CHECK-NEXT:  .LBB38_2: ; %if
 ; CHECK-NEXT:    s_mov_b32 s0, 0
-; CHECK-NEXT:    s_branch .LBB36_3
-; CHECK-NEXT:  .LBB36_3:
+; CHECK-NEXT:    s_branch .LBB38_3
+; CHECK-NEXT:  .LBB38_3:
   %cmp = icmp ne ptr addrspace(4) @1, null
   br i1 %cmp, label %endif, label %if

dstutt

This looks reasonable.
I haven't verified it against the original regressions we saw yet.. but I can do that if you need it before this is merged. (Probably not required?)

LU-JOHN · 2025-11-13T19:01:30Z

This looks reasonable. I haven't verified it against the original regressions we saw yet.. but I can do that if you need it before this is merged. (Probably not required?)

Please verify to see if it fixes your regressions. This will give us more confidence that it fixes the issue or if it only fixes some regressions.

perlfu · 2025-11-14T01:24:54Z

llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp

  if (Opc == AMDGPU::S_AND_B32) {
-    if (isPowerOf2_32(~Imm)) {
+    if (isPowerOf2_32(~Imm) &&
+        MBB.computeRegisterLiveness(TRI, AMDGPU::SCC, std::next(MBBI), 2) ==


I think it would be better to just check the dead flag on the SCC def.
Neighborhood search of 2 seems too small, dead flag is absolute.

Use isDead.

Signed-off-by: John Lu <John.Lu@amd.com>

github-actions · 2025-11-14T05:08:13Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: John Lu <John.Lu@amd.com>

arsenm · 2025-11-14T06:23:53Z

llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp

-    if (isPowerOf2_32(~Imm)) {
+    MachineOperand *SccDef =
+        MI.findRegisterDefOperand(AMDGPU::SCC, /*TRI=*/nullptr);
+    if (isPowerOf2_32(~Imm) && SccDef->isDead()) {


Suggested change

if (isPowerOf2_32(~Imm) && SccDef->isDead()) {

if (SccDef->isDead() && isPowerOf2_32(~Imm)) {

arsenm · 2025-11-14T06:24:09Z

llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp

-    if (isPowerOf2_32(Imm)) {
+    MachineOperand *SccDef =
+        MI.findRegisterDefOperand(AMDGPU::SCC, /*TRI=*/nullptr);
+    if (isPowerOf2_32(Imm) && SccDef->isDead()) {


Suggested change

if (isPowerOf2_32(Imm) && SccDef->isDead()) {

if (SccDef->isDead() && isPowerOf2_32(Imm)) {

Alternatively, short-circuit the call to findRegisterDefOperand in the pretty common case that Imm is not a power of two?

Suggested change

if (isPowerOf2_32(Imm) && SccDef->isDead()) {

if (isPowerOf2_32(Imm) && MI.findRegisterDefOperand(AMDGPU::SCC, /*TRI=*/nullptr)->isDead()) {

dstutt

Confirmed, this fixes the regressions we observed. Thanks.

Ensure SCC is not live before shrinking to s_bitset*

ac4308d

Signed-off-by: John Lu <John.Lu@amd.com>

llvmbot added the backend:AMDGPU label Nov 13, 2025

dstutt reviewed Nov 13, 2025

View reviewed changes

LU-JOHN requested a review from jayfoad November 14, 2025 00:05

perlfu reviewed Nov 14, 2025

View reviewed changes

Use isDead

3c25564

Signed-off-by: John Lu <John.Lu@amd.com>

Apply clang-format

59d948e

Signed-off-by: John Lu <John.Lu@amd.com>

arsenm approved these changes Nov 14, 2025

View reviewed changes

dstutt approved these changes Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* #167907

[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* #167907

LU-JOHN commented Nov 13, 2025

Uh oh!

LU-JOHN commented Nov 13, 2025

Uh oh!

llvmbot commented Nov 13, 2025

Uh oh!

dstutt left a comment

Uh oh!

LU-JOHN commented Nov 13, 2025

Uh oh!

perlfu Nov 14, 2025

Uh oh!

LU-JOHN Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

arsenm Nov 14, 2025

Uh oh!

arsenm Nov 14, 2025

Uh oh!

jayfoad Nov 14, 2025

Uh oh!

dstutt left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	if (isPowerOf2_32(~Imm) && SccDef->isDead()) {
	if (SccDef->isDead() && isPowerOf2_32(~Imm)) {

	if (isPowerOf2_32(Imm) && SccDef->isDead()) {
	if (SccDef->isDead() && isPowerOf2_32(Imm)) {

	if (isPowerOf2_32(Imm) && SccDef->isDead()) {
	if (isPowerOf2_32(Imm) && MI.findRegisterDefOperand(AMDGPU::SCC, /TRI=/nullptr)->isDead()) {

[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* #167907

Are you sure you want to change the base?

[AMDGPU] Ensure SCC is not live before shrinking to s_bitset* #167907

Conversation

LU-JOHN commented Nov 13, 2025

Uh oh!

LU-JOHN commented Nov 13, 2025

Uh oh!

llvmbot commented Nov 13, 2025

Uh oh!

dstutt left a comment

Choose a reason for hiding this comment

Uh oh!

LU-JOHN commented Nov 13, 2025

Uh oh!

perlfu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

dstutt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Nov 14, 2025 •

edited

Loading