Skip to content

[CodeGen][ExpandReductions] assertion on scalable vector_reduce_add with AArch64 SVE2 udot reduction #188024

@alexreinking

Description

@alexreinking

The following bug summary was produced during a bug investigation with OpenAI Codex (GPT 5.4). I (Alex Reinking) have reviewed all of the machine-generated code and text below for accuracy. This issue was discovered in Halide's CI here: halide/Halide#9073

If you are interested in using Halide's LLVM builds, they can be easily installed via uv:

$ mkdir halide-repro-env && cd halide-repro-env
$ uv init
Initialized project `halide-repro-env`
$ uv add halide-llvm --prerelease=allow --index https://pypi.halide-lang.org/simple
Using CPython 3.14.0
Creating virtual environment at: .venv
Resolved 2 packages in 108ms
Installed 1 package in 106ms
 + halide-llvm==23.0.0.dev86417+gf014202d

Note these builds contain (close to) the minimum necessary to build Halide.


Summary

We are seeing an LLVM assertion failure when compiling AArch64 SVE2 IR that combines:

llvm.aarch64.sve.udot.nxv2i64
llvm.vector.reduce.add.nxv2i64

The crash occurs in LLVM's ExpandReductions pass, which appears to assume that vector_reduce_add operands are fixed-width vectors.

This was observed while compiling Halide-generated IR, but we reduced it to a standalone LLVM reproducer.

We have now confirmed the standalone reproducer against both:

  • the packaged assertions-on LLVM used by Halide in .venv/.../halide_llvm/data
  • a full local LLVM build at f014202dac32

Regression Window

For us, the regression appeared after updating LLVM 23 from:

  • 69780be1

to:

  • f014202d

The most likely exposing change in that range is:

  • 221d2f5 [AArch64] Add partial reduce patterns for new sve dot variants (#184649)

This is only a suspicion. The actual assertion is in generic LLVM ExpandReductions, not in AArch64-specific code.

Observed Failure

Assertion:

Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"),
function cast, file Casting.h, line 572.

Standalone reproducer stack:

Running pass 'Function Pass Manager' on module '/tmp/reduce_udot_min.ll'.
Running pass 'Expand reduction intrinsics' on function '@f'
...
(anonymous namespace)::expandReductions(llvm::Function&, llvm::TargetTransformInfo const*)
llvm::FPPassManager::runOnFunction
llvm::FPPassManager::runOnModule
llvm::legacy::PassManagerImpl::run

Why This Looks Wrong

In llvm/lib/CodeGen/ExpandReductions.cpp, vector_reduce_add expansion does:

cast<FixedVectorType>(Vec->getType())

However, in this reproducer, Vec has type:

<vscale x 2 x i64>

That makes the cast invalid for scalable-vector reductions.

Minimal IR Reproducer

This .ll is sufficient to reproduce the assertion in a standalone LLVM API driver:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-gnueabihf"

declare <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64>, <vscale x 8 x i16>, <vscale x 8 x i16>)
declare i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64>)

define i64 @f(ptr %p) {
entry:
  %a = load <vscale x 8 x i16>, ptr %p, align 16
  %p2 = getelementptr i8, ptr %p, i64 64
  %b = load <vscale x 8 x i16>, ptr %p2, align 16
  %d = call <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
  %r = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> %d)
  ret i64 %r
}

Standalone Reproducer

I do not currently have a pure llc file.ll reproducer.

I do have a standalone reproducer using LLVM's own APIs. It reproduces both with the packaged assertions-on LLVM used by Halide and with a full local build of LLVM at f014202dac32.

Driver source:

#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/CommandFlags.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Pass.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"

#include <memory>
#include <string>

using namespace llvm;

static std::string getModuleFlagString(Module &M, StringRef Name) {
  if (auto *MD = M.getModuleFlag(Name)) {
    if (auto *MDS = dyn_cast<MDString>(MD)) {
      return MDS->getString().str();
    }
  }
  return "";
}

static std::unique_ptr<TargetMachine> makeTM(Module &M) {
  Triple TT = M.getTargetTriple();
  std::string TripleStr = TT.getTriple();
  std::string Error;
  const Target *T = TargetRegistry::lookupTarget(TT, Error);
  if (!T) {
    errs() << "lookupTarget failed for " << TripleStr << ": " << Error << "\n";
    return nullptr;
  }

  std::string CPU = getModuleFlagString(M, "halide_mcpu_target");
  std::string Features = getModuleFlagString(M, "halide_mattrs");
  TargetOptions Options;
  auto RM = std::optional<Reloc::Model>(Reloc::PIC_);
  return std::unique_ptr<TargetMachine>(
      T->createTargetMachine(TT, CPU, Features, Options, RM));
}

static bool emitOne(StringRef InPath, StringRef OutPath,
                    CodeGenFileType FileType) {
  LLVMContext Ctx;
  SMDiagnostic Err;
  auto M = parseIRFile(InPath, Err, Ctx);
  if (!M) {
    Err.print("llvm_emit_repro", errs());
    return false;
  }

  auto TM = makeTM(*M);
  if (!TM) {
    return false;
  }
  M->setDataLayout(TM->createDataLayout());

  std::error_code EC;
  auto Out = std::make_unique<ToolOutputFile>(OutPath, EC, sys::fs::OF_None);
  if (EC) {
    errs() << "open failed for " << OutPath << ": " << EC.message() << "\n";
    return false;
  }

  legacy::PassManager PM;
  PM.add(new TargetLibraryInfoWrapperPass(Triple(M->getTargetTriple())));
  PM.add(createAlwaysInlinerLegacyPass());
  TM->Options.MCOptions.AsmVerbose = true;
  TM->addPassesToEmitFile(PM, Out->os(), nullptr, FileType);
  PM.run(*M);
  Out->keep();
  return true;
}

int main(int argc, char **argv) {
  InitLLVM X(argc, argv);
  if (argc < 3) {
    errs() << "usage: llvm_emit_repro outdir file1.ll [file2.ll ...]\n";
    return 2;
  }

  LLVMInitializeAArch64TargetInfo();
  LLVMInitializeAArch64Target();
  LLVMInitializeAArch64TargetMC();
  LLVMInitializeAArch64AsmPrinter();
  LLVMInitializeAArch64AsmParser();

  std::string OutDir = argv[1];
  for (int i = 2; i < argc; i++) {
    std::string InPath = argv[i];
    std::string Base = sys::path::filename(InPath).str();
    std::string OutPath = OutDir + "/" + Base + ".s";
    errs() << "emitting " << InPath << " -> " << OutPath << "\n";
    if (!emitOne(InPath, OutPath, CodeGenFileType::AssemblyFile)) {
      return 1;
    }
  }
  return 0;
}

How I Built the Reproducer

I confirmed this with two different LLVM builds:

  • an assertions-on packaged LLVM used by Halide
  • a full local LLVM build at f014202dac32

The same reproducer source and the same minimal .ll worked for both.

For a generic local LLVM build rooted at $LLVM_ROOT with build directory $LLVM_BUILD, the build command is:

cat > /tmp/llvm_emit_repro.cpp <<'EOF'
// paste the reproducer source from this issue here
EOF

cat > /tmp/reduce_udot_min.ll <<'EOF'
; paste the minimal IR from this issue here
EOF

/usr/bin/clang++ \
  -O0 /tmp/llvm_emit_repro.cpp \
  -I"$LLVM_ROOT/llvm/include" \
  -I"$LLVM_BUILD/include" \
  -std=c++17 \
  $(test -d /opt/homebrew/lib && echo -L/opt/homebrew/lib) \
  $("$LLVM_BUILD/bin/llvm-config" --ldflags --system-libs --libs all) \
  -o /tmp/llvm_emit_repro

Run command:

mkdir -p /tmp/llvm_emit_repro_out
/tmp/llvm_emit_repro /tmp/llvm_emit_repro_out /tmp/reduce_udot_min.ll

Observed output:

emitting /tmp/reduce_udot_min.ll -> /tmp/llvm_emit_repro_out/reduce_udot_min.ll.s
Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 572.
...
Running pass 'Expand reduction intrinsics' on function '@f'

For a packaged LLVM installation rooted at $LLVM_PKG, an equivalent build command is:

SDK=$(xcrun --show-sdk-path)
"$LLVM_PKG/bin/clang++" \
  -isysroot "$SDK" \
  -O0 /tmp/llvm_emit_repro.cpp \
  -I"$LLVM_PKG/include" \
  -std=c++17 \
  -Wl,-rpath,"$LLVM_PKG/lib" \
  $("$LLVM_PKG/bin/llvm-config" --ldflags --system-libs --libs all) \
  -o /tmp/llvm_emit_repro_pkg

Run command:

mkdir -p /tmp/llvm_emit_repro_out
/tmp/llvm_emit_repro_pkg /tmp/llvm_emit_repro_out /tmp/reduce_udot_min.ll

Observed output was the same.

llc Status

For completeness: an assertions-enabled local llc build from the same local LLVM build did not reproduce for me on this minimal .ll:

llc -mtriple=aarch64-unknown-linux-gnu -mattr=+sve2 -o /tmp/reduce_udot_min.s /tmp/reduce_udot_min.ll

That succeeded.

So the issue is currently confirmed as a standalone LLVM API reproducer, but not yet as a standalone llc command-line reproducer.

Workaround

Disabling ExpandReductions avoids the crash in Halide:

HL_LLVM_ARGS='-disable-expand-reductions'

Expected Behavior

LLVM should not assert here. It should either:

  • support scalable vector_reduce_add in ExpandReductions, or
  • avoid expanding such reductions when only fixed-width handling exists.

Suggested Direction

The immediate problem seems to be that ExpandReductions assumes fixed-width vector operands for vector_reduce_add and friends. A guard for scalable vectors before any cast<FixedVectorType> would avoid the assertion and likely point to the intended target-specific handling path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions