Skip to content

perf: optimize FASTQ reader with memchr, skipLine, and readExact#671

Open
KimBioInfoStudio wants to merge 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:perf/fastq-reader-optimizations
Open

perf: optimize FASTQ reader with memchr, skipLine, and readExact#671
KimBioInfoStudio wants to merge 1 commit intoOpenGene:masterfrom
KimBioInfoStudio:perf/fastq-reader-optimizations

Conversation

@KimBioInfoStudio
Copy link
Member

Summary

  • Replace byte-by-byte \r/\n scanning in getLine() with memchr (SIMD-accelerated in libc)
  • Add skipLine() to skip the + strand line without string allocation
  • Add readExact() to read quality line by known length (== sequence length), no newline scan needed
  • Fix minor off-by-one in \r\n skip check at buffer boundary

Benchmark (2M PE reads, Apple M4 Pro, -w 10)

Mode Master (s) Opt (s) Speedup
fq→fq 1.07 0.98 1.08x
gz→fq 2.42 2.23 1.08x
stdin→stdout 0.66 0.46 1.44x

Output gz modes show no change (bottleneck is compression, not parsing).

Verification

  • MD5 output matches master across all tested modes
  • All existing test data processes correctly

Test plan

  • Local benchmark (2M reads, fq-fq, gz-fq, stdin-stdout)
  • MD5 verification against master output
  • Remote server benchmark (ARM64 DGX)

🤖 Generated with Claude Code

Replace byte-by-byte newline scanning in getLine() with memchr for
SIMD-accelerated delimiter search. Add skipLine() to skip the '+' strand
line without string allocation, and readExact() to read quality by known
length (== sequence length) without scanning for newlines.

Benchmark (2M PE reads, M4 Pro, -w 10):
  fq→fq: 1.08x speedup
  gz→fq: 1.08x speedup
  stdin→stdout: 1.44x speedup (memchr alone)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant