-
-
Notifications
You must be signed in to change notification settings - Fork 19.8k
Open
Description
Everything below this line is Claude, opened upon request in #64515.
Summary
SIMD intrinsics have come up in multiple PRs now (#64515, and @Alvaro-Kothe's work). Before merging any of these, we should align on an approach. This issue collects the tradeoffs discussed so far.
Current state
- PERF: Use SIMD for read_csv C tokenizer #64515 uses hand-written SSE2 (x86-64) and NEON (arm64) intrinsics in the C tokenizer to scan for special characters 16 bytes at a time
- These are baseline instruction sets —
__SSE2__and__ARM_NEONare predefined by the compiler on their respective architectures, no special flags needed - CI exercises both paths: x86-64 (Linux, macOS Intel, Windows) and arm64 (Linux, macOS Apple Silicon, Windows ARM64)
cpp_std=c++17is now in the build (from fast_float), so C++ is available
Options
1. Hand-written intrinsics (status quo in #64515)
- Pros: no new dependencies, minimal code (~100 lines for 4 functions), compile-time selection via
#ifdef - Cons: must handle compiler portability ourselves (e.g.
__builtin_ctzvs MSVC_BitScanForward), duplicated logic per architecture
2. xsimd (used by Arrow C++)
- Header-only C++14 library (~4.7 MB), would need vendoring or a build dependency
- Provides a unified API across SSE2/AVX2/AVX-512/NEON/SVE/etc.
- Would require extracting SIMD code into
.cppfiles withextern "C"linkage (sincetokenizer.cis C) - Tested across many architectures by the xsimd project itself
3. Google Highway (used by NumPy)
- C++17 library, not header-only (needs ~10 compiled source files, ~31 MB repo)
- NumPy vendors it as a git submodule
- Designed for runtime dispatch across many ISA levels — more machinery than we currently need
- Heavier integration cost
4. Compiler vector extensions / autovectorization
- GCC/Clang support
__attribute__((vector_size(N)))portable vector types - Compiler does the architecture mapping, no library needed
- Less control over generated code; may not handle the "find first matching byte" pattern well
5. Meson SIMD module
- Designed for compiling separate source files with non-baseline flags (e.g.
-mavx2) and runtime dispatch - Not applicable to the current use case (SSE2/NEON are baseline, no runtime dispatch needed)
- Could become relevant if we wanted optional AVX2/AVX-512 paths in the future
Questions to resolve
- Is the scope of SIMD usage in pandas likely to grow beyond the tokenizer, or is this a one-off?
- If one-off, do hand-written intrinsics suffice? The maintenance burden so far has been one portability fix (
__builtin_ctzon MSVC). - If we expect growth, is xsimd the right choice given that C++17 is already in the build?
- Should we block PERF: Use SIMD for read_csv C tokenizer #64515 on this decision, or merge the hand-written version and migrate later if needed?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels