Skip to content

Add integration test for MetaspacePreTokenizer fix#321

Merged
pcuenca merged 1 commit intohuggingface:mainfrom
beshkenadze:add-metaspace-integration-test
Feb 26, 2026
Merged

Add integration test for MetaspacePreTokenizer fix#321
pcuenca merged 1 commit intohuggingface:mainfrom
beshkenadze:add-metaspace-integration-test

Conversation

@beshkenadze
Copy link
Contributor

Summary

Test plan

  • swift test --filter kredorPunctuateAllTokenizer passes
  • Follows existing test patterns in TokenizerTests.swift

Test downloads kredor/punctuate-all tokenizer and verifies correct
XLM-RoBERTa tokenization when prepend_scheme is used without
addPrefixSpace.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an integration test to verify the fix for the MetaspacePreTokenizer issue where XLM-RoBERTa tokenization was broken when prepend_scheme was used without addPrefixSpace. The test validates that the fix from PR #319 correctly handles the kredor/punctuate-all tokenizer, which is an XLM-RoBERTa model that relies on prepend_scheme: "always" in its Metaspace configuration.

Changes:

  • Added kredorPunctuateAllTokenizer() integration test that downloads the kredor/punctuate-all model and verifies correct token IDs for the input "okay so lets get started"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pcuenca pcuenca merged commit 67baef8 into huggingface:main Feb 26, 2026
6 of 7 checks passed
DePasqualeOrg pushed a commit to DePasqualeOrg/swift-tokenizers that referenced this pull request Mar 4, 2026
Test downloads kredor/punctuate-all tokenizer and verifies correct
XLM-RoBERTa tokenization when prepend_scheme is used without
addPrefixSpace.

Cherry-picked from huggingface/swift-transformers#321 (67baef8).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants