[CodeGen][ExpandReductions] assertion on scalable vector_reduce_add with AArch64 SVE2 udot reduction

The following bug summary was produced during a bug investigation with OpenAI Codex (GPT 5.4). I (Alex Reinking) have reviewed all of the machine-generated code and text below for accuracy. This issue was discovered in Halide's CI here: https://github.com/halide/Halide/pull/9073

If you are interested in using Halide's LLVM builds, they can be easily installed via `uv`:

```console
$ mkdir halide-repro-env && cd halide-repro-env
$ uv init
Initialized project `halide-repro-env`
$ uv add halide-llvm --prerelease=allow --index https://pypi.halide-lang.org/simple
Using CPython 3.14.0
Creating virtual environment at: .venv
Resolved 2 packages in 108ms
Installed 1 package in 106ms
 + halide-llvm==23.0.0.dev86417+gf014202d
```

Note these builds contain (close to) the minimum necessary to build Halide.

---

**Summary**

We are seeing an LLVM assertion failure when compiling AArch64 SVE2 IR that combines:

```llvm
llvm.aarch64.sve.udot.nxv2i64
llvm.vector.reduce.add.nxv2i64
```

The crash occurs in LLVM's `ExpandReductions` pass, which appears to assume that `vector_reduce_add` operands are fixed-width vectors.

This was observed while compiling Halide-generated IR, but we reduced it to a standalone LLVM reproducer.

We have now confirmed the standalone reproducer against both:

- the packaged assertions-on LLVM used by Halide in `.venv/.../halide_llvm/data`
- a full local LLVM build at `f014202dac32`

**Regression Window**

For us, the regression appeared after updating LLVM 23 from:

- `69780be1`

to:

- `f014202d`

The most likely exposing change in that range is:

- 221d2f57eccd `[AArch64] Add partial reduce patterns for new sve dot variants (#184649)`

This is only a suspicion. The actual assertion is in generic LLVM `ExpandReductions`, not in AArch64-specific code.

**Observed Failure**

Assertion:

```text
Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"),
function cast, file Casting.h, line 572.
```

Standalone reproducer stack:

```text
Running pass 'Function Pass Manager' on module '/tmp/reduce_udot_min.ll'.
Running pass 'Expand reduction intrinsics' on function '@f'
...
(anonymous namespace)::expandReductions(llvm::Function&, llvm::TargetTransformInfo const*)
llvm::FPPassManager::runOnFunction
llvm::FPPassManager::runOnModule
llvm::legacy::PassManagerImpl::run
```

**Why This Looks Wrong**

In [`llvm/lib/CodeGen/ExpandReductions.cpp`](https://github.com/llvm/llvm-project/blob/f014202dac325c576addd857b558f7d9d2b28905/llvm/lib/CodeGen/ExpandReductions.cpp#L127-L132), `vector_reduce_add` expansion does:

```c++
cast<FixedVectorType>(Vec->getType())
```

However, in this reproducer, `Vec` has type:

```llvm
<vscale x 2 x i64>
```

That makes the cast invalid for scalable-vector reductions.

**Minimal IR Reproducer**

This `.ll` is sufficient to reproduce the assertion in a standalone LLVM API driver:

```llvm
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-gnueabihf"

declare <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64>, <vscale x 8 x i16>, <vscale x 8 x i16>)
declare i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64>)

define i64 @f(ptr %p) {
entry:
  %a = load <vscale x 8 x i16>, ptr %p, align 16
  %p2 = getelementptr i8, ptr %p, i64 64
  %b = load <vscale x 8 x i16>, ptr %p2, align 16
  %d = call <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
  %r = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> %d)
  ret i64 %r
}
```

**Standalone Reproducer**

I do not currently have a pure `llc file.ll` reproducer.

I do have a standalone reproducer using LLVM's own APIs. It reproduces both with the packaged assertions-on LLVM used by Halide and with a full local build of LLVM at `f014202dac32`.

Driver source:

```cpp
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/CommandFlags.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Pass.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"

#include <memory>
#include <string>

using namespace llvm;

static std::string getModuleFlagString(Module &M, StringRef Name) {
  if (auto *MD = M.getModuleFlag(Name)) {
    if (auto *MDS = dyn_cast<MDString>(MD)) {
      return MDS->getString().str();
    }
  }
  return "";
}

static std::unique_ptr<TargetMachine> makeTM(Module &M) {
  Triple TT = M.getTargetTriple();
  std::string TripleStr = TT.getTriple();
  std::string Error;
  const Target *T = TargetRegistry::lookupTarget(TT, Error);
  if (!T) {
    errs() << "lookupTarget failed for " << TripleStr << ": " << Error << "\n";
    return nullptr;
  }

  std::string CPU = getModuleFlagString(M, "halide_mcpu_target");
  std::string Features = getModuleFlagString(M, "halide_mattrs");
  TargetOptions Options;
  auto RM = std::optional<Reloc::Model>(Reloc::PIC_);
  return std::unique_ptr<TargetMachine>(
      T->createTargetMachine(TT, CPU, Features, Options, RM));
}

static bool emitOne(StringRef InPath, StringRef OutPath,
                    CodeGenFileType FileType) {
  LLVMContext Ctx;
  SMDiagnostic Err;
  auto M = parseIRFile(InPath, Err, Ctx);
  if (!M) {
    Err.print("llvm_emit_repro", errs());
    return false;
  }

  auto TM = makeTM(*M);
  if (!TM) {
    return false;
  }
  M->setDataLayout(TM->createDataLayout());

  std::error_code EC;
  auto Out = std::make_unique<ToolOutputFile>(OutPath, EC, sys::fs::OF_None);
  if (EC) {
    errs() << "open failed for " << OutPath << ": " << EC.message() << "\n";
    return false;
  }

  legacy::PassManager PM;
  PM.add(new TargetLibraryInfoWrapperPass(Triple(M->getTargetTriple())));
  PM.add(createAlwaysInlinerLegacyPass());
  TM->Options.MCOptions.AsmVerbose = true;
  TM->addPassesToEmitFile(PM, Out->os(), nullptr, FileType);
  PM.run(*M);
  Out->keep();
  return true;
}

int main(int argc, char **argv) {
  InitLLVM X(argc, argv);
  if (argc < 3) {
    errs() << "usage: llvm_emit_repro outdir file1.ll [file2.ll ...]\n";
    return 2;
  }

  LLVMInitializeAArch64TargetInfo();
  LLVMInitializeAArch64Target();
  LLVMInitializeAArch64TargetMC();
  LLVMInitializeAArch64AsmPrinter();
  LLVMInitializeAArch64AsmParser();

  std::string OutDir = argv[1];
  for (int i = 2; i < argc; i++) {
    std::string InPath = argv[i];
    std::string Base = sys::path::filename(InPath).str();
    std::string OutPath = OutDir + "/" + Base + ".s";
    errs() << "emitting " << InPath << " -> " << OutPath << "\n";
    if (!emitOne(InPath, OutPath, CodeGenFileType::AssemblyFile)) {
      return 1;
    }
  }
  return 0;
}
```

**How I Built the Reproducer**

I confirmed this with two different LLVM builds:

- an assertions-on packaged LLVM used by Halide
- a full local LLVM build at `f014202dac32`

The same reproducer source and the same minimal `.ll` worked for both.

For a generic local LLVM build rooted at `$LLVM_ROOT` with build directory `$LLVM_BUILD`, the build command is:

```bash
cat > /tmp/llvm_emit_repro.cpp <<'EOF'
// paste the reproducer source from this issue here
EOF

cat > /tmp/reduce_udot_min.ll <<'EOF'
; paste the minimal IR from this issue here
EOF

/usr/bin/clang++ \
  -O0 /tmp/llvm_emit_repro.cpp \
  -I"$LLVM_ROOT/llvm/include" \
  -I"$LLVM_BUILD/include" \
  -std=c++17 \
  $(test -d /opt/homebrew/lib && echo -L/opt/homebrew/lib) \
  $("$LLVM_BUILD/bin/llvm-config" --ldflags --system-libs --libs all) \
  -o /tmp/llvm_emit_repro
```

Run command:

```bash
mkdir -p /tmp/llvm_emit_repro_out
/tmp/llvm_emit_repro /tmp/llvm_emit_repro_out /tmp/reduce_udot_min.ll
```

Observed output:

```text
emitting /tmp/reduce_udot_min.ll -> /tmp/llvm_emit_repro_out/reduce_udot_min.ll.s
Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 572.
...
Running pass 'Expand reduction intrinsics' on function '@f'
```

For a packaged LLVM installation rooted at `$LLVM_PKG`, an equivalent build command is:

```bash
SDK=$(xcrun --show-sdk-path)
"$LLVM_PKG/bin/clang++" \
  -isysroot "$SDK" \
  -O0 /tmp/llvm_emit_repro.cpp \
  -I"$LLVM_PKG/include" \
  -std=c++17 \
  -Wl,-rpath,"$LLVM_PKG/lib" \
  $("$LLVM_PKG/bin/llvm-config" --ldflags --system-libs --libs all) \
  -o /tmp/llvm_emit_repro_pkg
```

Run command:

```bash
mkdir -p /tmp/llvm_emit_repro_out
/tmp/llvm_emit_repro_pkg /tmp/llvm_emit_repro_out /tmp/reduce_udot_min.ll
```

Observed output was the same.

**`llc` Status**

For completeness: an assertions-enabled local `llc` build from the same local LLVM build did **not** reproduce for me on this minimal `.ll`:

```bash
llc -mtriple=aarch64-unknown-linux-gnu -mattr=+sve2 -o /tmp/reduce_udot_min.s /tmp/reduce_udot_min.ll
```

That succeeded.

So the issue is currently confirmed as a standalone LLVM API reproducer, but not yet as a standalone `llc` command-line reproducer.

**Workaround**

Disabling `ExpandReductions` avoids the crash in Halide:

```bash
HL_LLVM_ARGS='-disable-expand-reductions'
```

**Expected Behavior**

LLVM should not assert here. It should either:

- support scalable `vector_reduce_add` in `ExpandReductions`, or
- avoid expanding such reductions when only fixed-width handling exists.

**Suggested Direction**

The immediate problem seems to be that `ExpandReductions` assumes fixed-width vector operands for `vector_reduce_add` and friends. A guard for scalable vectors before any `cast<FixedVectorType>` would avoid the assertion and likely point to the intended target-specific handling path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen][ExpandReductions] assertion on scalable vector_reduce_add with AArch64 SVE2 udot reduction #188024

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CodeGen][ExpandReductions] assertion on scalable vector_reduce_add with AArch64 SVE2 udot reduction #188024

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions