Skip to content

Fix yyjson parity with JSONSerialization#328

Merged
mattt merged 3 commits intohuggingface:mainfrom
ZachNagengast:reduce-error-scope-for-yyjson
Mar 9, 2026
Merged

Fix yyjson parity with JSONSerialization#328
mattt merged 3 commits intohuggingface:mainfrom
ZachNagengast:reduce-error-scope-for-yyjson

Conversation

@ZachNagengast
Copy link
Collaborator

@ZachNagengast ZachNagengast commented Mar 6, 2026

We noticed a breaking change from #304 with one of our tokenizers that had trailing commas in it.

yyjson read failed (code 7) at position 118: trailing comma is not allowed.

Luckily yyjson has a flag for this (expanding on @ronaldmannak's work in #324) so it can be brought back into parity with the previous JSONSerialization, but it requires a new release because protocolization was decided against.

This PR adds flags to reduce the validation error checks down to a level that JSONSerialization was allowing before.

Along with this, added some comprehensive tests to make sure we stay in parity with JSONSerialization in the future.
Through writing these tests, I also found another breaking change regarding NUL characters where String(cString: "before\u0000after") would stop parsing after the NUL part, returning before only instead of beforeafter, but this was on the swift side not related to yyjson.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the yyjson-backed YYJSONParser to better handle real-world tokenizer JSON files that previously loaded via Foundation-based parsing, and adds tests to lock in behavior (including edge cases like trailing commas and NULs in strings).

Changes:

  • Relax yyjson read options to allow trailing commas, BOM prefixes, and comments.
  • Fix string/key decoding to preserve embedded NUL (\u0000) by decoding using yyjson’s reported byte length.
  • Add expanded unit tests, including a new Foundation-vs-YYJSON comparison suite.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
Sources/Hub/YYJSONParser.swift Enables permissive yyjson read flags and fixes string/key decoding to preserve embedded NUL bytes.
Tests/TokenizersTests/YYJSONParserTests.swift Adds coverage for trailing commas/BOM/comments and introduces a Foundation comparison test suite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

YYJSON_READ_ALLOW_INF_AND_NAN
| YYJSON_READ_ALLOW_TRAILING_COMMAS
| YYJSON_READ_ALLOW_BOM
| YYJSON_READ_ALLOW_COMMENTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any examples in the wild of config files with comments? Maybe it's overly defensive, but I'd prefer to keep this narrowly tailored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and the related tests with 63e0d32. Happy to revert/reapply that in a follow-up as needed.

Copy link
Collaborator

@mattt mattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZachNagengast Thank you for this!

LGTM after removing allowance for comments (which aren't supported by JSONSerialization).

@mattt mattt merged commit 5926983 into huggingface:main Mar 9, 2026
3 checks passed
@ZachNagengast
Copy link
Collaborator Author

Good catch, appreciated the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants