Discussion: SIMD strategy for pandas C/C++ code

Everything below this line is Claude, opened upon request in #64515.

## Summary

SIMD intrinsics have come up in multiple PRs now (#64515, and @Alvaro-Kothe's work). Before merging any of these, we should align on an approach. This issue collects the tradeoffs discussed so far.

## Current state

- #64515 uses hand-written SSE2 (x86-64) and NEON (arm64) intrinsics in the C tokenizer to scan for special characters 16 bytes at a time
- These are **baseline** instruction sets — `__SSE2__` and `__ARM_NEON` are predefined by the compiler on their respective architectures, no special flags needed
- CI exercises both paths: x86-64 (Linux, macOS Intel, Windows) and arm64 (Linux, macOS Apple Silicon, Windows ARM64)
- `cpp_std=c++17` is now in the build (from fast_float), so C++ is available

## Options

### 1. Hand-written intrinsics (status quo in #64515)

- Pros: no new dependencies, minimal code (~100 lines for 4 functions), compile-time selection via `#ifdef`
- Cons: must handle compiler portability ourselves (e.g. `__builtin_ctz` vs MSVC `_BitScanForward`), duplicated logic per architecture

### 2. xsimd (used by Arrow C++)

- Header-only C++14 library (~4.7 MB), would need vendoring or a build dependency
- Provides a unified API across SSE2/AVX2/AVX-512/NEON/SVE/etc.
- Would require extracting SIMD code into `.cpp` files with `extern "C"` linkage (since `tokenizer.c` is C)
- Tested across many architectures by the xsimd project itself

### 3. Google Highway (used by NumPy)

- C++17 library, not header-only (needs ~10 compiled source files, ~31 MB repo)
- NumPy vendors it as a git submodule
- Designed for runtime dispatch across many ISA levels — more machinery than we currently need
- Heavier integration cost

### 4. Compiler vector extensions / autovectorization

- GCC/Clang support `__attribute__((vector_size(N)))` portable vector types
- Compiler does the architecture mapping, no library needed
- Less control over generated code; may not handle the "find first matching byte" pattern well

### 5. Meson SIMD module

- Designed for compiling separate source files with non-baseline flags (e.g. `-mavx2`) and runtime dispatch
- Not applicable to the current use case (SSE2/NEON are baseline, no runtime dispatch needed)
- Could become relevant if we wanted optional AVX2/AVX-512 paths in the future

## Questions to resolve

1. Is the scope of SIMD usage in pandas likely to grow beyond the tokenizer, or is this a one-off?
2. If one-off, do hand-written intrinsics suffice? The maintenance burden so far has been one portability fix (`__builtin_ctz` on MSVC).
3. If we expect growth, is xsimd the right choice given that C++17 is already in the build?
4. Should we block #64515 on this decision, or merge the hand-written version and migrate later if needed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: SIMD strategy for pandas C/C++ code #64884

Summary

Current state

Options

1. Hand-written intrinsics (status quo in #64515)

2. xsimd (used by Arrow C++)

3. Google Highway (used by NumPy)

4. Compiler vector extensions / autovectorization

5. Meson SIMD module

Questions to resolve

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discussion: SIMD strategy for pandas C/C++ code #64884

Description

Summary

Current state

Options

1. Hand-written intrinsics (status quo in #64515)

2. xsimd (used by Arrow C++)

3. Google Highway (used by NumPy)

4. Compiler vector extensions / autovectorization

5. Meson SIMD module

Questions to resolve

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions