Skip to content

Use yyjson for significantly faster JSON parsing#304

Merged
pcuenca merged 12 commits intohuggingface:mainfrom
DePasqualeOrg:json-optimization
Feb 10, 2026
Merged

Use yyjson for significantly faster JSON parsing#304
pcuenca merged 12 commits intohuggingface:mainfrom
DePasqualeOrg:json-optimization

Conversation

@DePasqualeOrg
Copy link
Contributor

@DePasqualeOrg DePasqualeOrg commented Dec 27, 2025

JSON parsing is one of the biggest performance bottlenecks for tokenizer loading, and yyjson, a high-performance C library, offers significant speed gains for large tokenizer files: it's 3.4x faster for raw JSON parsing and 2.1x faster for building the Config, saving around 600 ms in a typical tokenizer load.

Changes

  • Add yyjson 0.12.0 as a dependency
  • Add YYJSONParser with direct yyjson → Config conversion (no intermediate Foundation objects)
  • Update HubApi.configuration(fileURL:) to use yyjson
  • Remove JSONSerialization+BOM.swift (yyjson handles BOM correctly)
  • Add Benchmarks test target (run with RUN_BENCHMARKS=1 swift test --filter Benchmarks)

Performance

Tested with the 11.4 MB tokenizer.json from mlx-community/Qwen3-0.6B-Base-DQ5:

Benchmark yyjson JSONSerialization Improvement
Raw JSON parsing 19 ms 66 ms 3.4x (47 ms)
JSON → Config 540 ms 1,160 ms 2.1x (620 ms)

This saves ~600 ms per tokenizer load on an M3 MacBook Pro.

All existing tests pass.

@DePasqualeOrg
Copy link
Contributor Author

@mattt, @pcuenca, I think this PR would be a good one to start with whenever you're ready, since #303 is based on it. For that reason, #303 looks bigger than it actually is. I added some refinements to all three of my PRs in this repo today, and I think they're now all ready for review.

Comment on lines -328 to 337
guard let parsed = try? JSONSerialization.bomPreservingJsonObject(with: data) else {
throw Hub.HubClientError.jsonSerialization(fileURL: fileURL, message: "JSON Serialization failed for \(fileURL). Please verify that you have set the HF_TOKEN environment variable.")
do {
return try YYJSONParser.parseToConfig(data)
} catch {
throw Hub.HubClientError.jsonSerialization(
fileURL: fileURL,
message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set."
)
}
guard let dictionary = parsed as? [NSString: Any] else { throw Hub.HubClientError.parse }
return Config(dictionary)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2c on this:

I think theres an opportunity to protocolize json parsing, which would allow the dependency footprint to be reduced for this specific project but still enable yyjson usage outside of it.

protocol JSONParser {
    func parseToConfig(_ data: Data) throws -> Config
}

Then

func configuration(fileURL: URL, parser: JSONParser = DefaultJSONParser()) throws -> Config {
    let data = try Data(contentsOf: fileURL)
    do {
        return try parser.parseToConfig(data)
    } catch {
        throw Hub.HubClientError.jsonSerialization(
            fileURL: fileURL,
            message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set."
        )
    }
}

Then JSONParser could be passed to the HubApi init or an object that is passed into configuration call.

let customParser = YYJSONParser()
let config = try hubApi.configuration(fileURL: someURL, parser: customParser)

Ideally this project would remain pure swift w/ swift dependencies but still allow fast implementations via protocols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea, although the Python transformers library uses the Rust tokenizers library, which uses serde for JSON parsing. I think there is a good argument for just having a fast default like in the Python transformers, especially since what's available in Swift is so slow. People running MLX models in Swift are already using C++ libraries through C bridging. yyjson is in C, so Swift can call it directly with minimal overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DePasqualeOrg Amazing work! I just opened a PR demonstrating the effect of in-situ parsing on speed and memory here: DePasqualeOrg#2

@ZachNagengast I'm sympathetic to the idea of dependency injection, but in this case, it's hard to imagine a scenario in which an API consumer wouldn't opt-in to faster JSON parsing. Assuming the performance is consistently better, and barring segfaults or incorrect behavior, then this seems like a slam dunk.

If the additional dependency is a concern, I suppose we could compromise with a trait that's enabled by default and could be disabled on an opt-out basis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast default would be great, on the other hand swift apps have the consideration of compilation time and distributable binary size that also should be optimized. Testing the build on this branch appears to add 1.2MB of C code which compresses well to be fair to around 113KB. Do you think this dependency can be transitioned via the protocol to the MLX repo since that is already compiling C code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted before reading your comment, the extra dependency is a concern but it could be isolated with traits or simple compiler flags checking for canImport(yyjson) similar to this WIP branch that pulls jinja out of the compilation: main...ZachNagengast:swift-transformers:optional-jinja-import-for-hub-and-tokenizers

Something like this would allow the Transformers library to import the fast solution by default, but more targeted implementations that just want Hub and Tokenizers could have an optimal dependency footprint

@DePasqualeOrg
Copy link
Contributor Author

Thanks for this, @mattt. I dug into it, and it looks like both methods use identical memory (~68 MB) when measured in separate tests. The 0 KB measurement may have been due to memory reuse between sequential tests. Let me know what you think: https://github.com/DePasqualeOrg/swift-transformers/tree/benchmark-memory-use

@mattt
Copy link
Collaborator

mattt commented Jan 9, 2026

@DePasqualeOrg Running my own benchmarks, I found that YYJSON is actually ~8.7x faster than Foundation for parsing that ~10MB tokenizers.json file:

Metric Foundation YYJSON Improvement
Time (p50) 57.0 ms 6.5 ms 8.7x faster
Peak Memory 242 MB 52 MB 78% less

And according to Swift Benchmark, in-situ parsing correctly showed 0 allocations.

All the more reason for us to move forward, in my opinion.

@pcuenca Any strong feelings about how to proceed?

@DePasqualeOrg
Copy link
Contributor Author

@mattt, I don't fully understand the implications of in-situ parsing, but I'm not sure there's a benefit. Here's the analysis from Claude Code, for the record:

The "0 allocations" result comes from measuring only the parse step, after the buffer is allocated and before the Config conversion. Since convertToConfig immediately copies all strings via String(cString:), the in-situ benefit is negated.

@pcuenca
Copy link
Member

pcuenca commented Feb 6, 2026

Sorry for the delay.

Tokenizer loading, unlike Jinja templates, is a core functionality that most clients of this library use. In my opinion it'd be useful to have the fast path enabled by default. Delegating this integration to a particular downstream project would negate the benefits for others.

On the use of a trait to opt out, I think it adds some maintenance burden to this project so I'd personally prefer not to do it. I'd be happy if we can remove the BOM handling workaround and provide a single and clear path for users of this code. When weighing the quantitative differences (slightly longer compilation times, 113KB of additional size) against clarity, I lean towards the latter.

So using the trait boils down, in my opinion, to the qualitative consequences of declaring a dependency from a new origin. If this is intractable for you @ZachNagengast, then we could consider it. Would a huggingface fork alleviate this issue? (This would also have issues, such as replicating fixes from upstream; just trying to considering what the options are).

@pcuenca
Copy link
Member

pcuenca commented Feb 6, 2026

(I like the elegance of the in-situ approach but I think we can defer that decision to a new PR).

Copy link
Member

@pcuenca pcuenca Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this run on every test pass, including CI? If so, I'd place it in a separate folder that is not triggered by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks only run when explicitly enabled:

run with RUN_BENCHMARKS=1 swift test --filter Benchmarks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the envvar yes. Technically, it would also run with RUN_BENCHMARKS=0, but it's ok as long as it's not the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. With the latest commit, they will only run with with RUN_BENCHMARKS=1.

@ZachNagengast
Copy link
Collaborator

@pcuenca Your points make sense to me. It also looks like @mattt has some good progress here https://github.com/mattt/swift-yyjson, which appears to bundle a version of the yyjson c code internally. In terms of supply chain, it would be great to have stronger pinning or isolation of yyjson like this. Here are a few options, roughly in order of effort:

  1. Change the yyjson dependency from from: to exact: to enforce a strict, validated version and prevent unexpected upstream changes.
  2. Depend on swift-yyjson instead (which bundles/pins yyjson) and may be more flexible if someone wants to import a newer version of yyjson separately vs exact pinning.
  3. (Less preferred) Add an explicit opt-out for yyjson as the JSON parsing backend, while keeping it as the fast default.

The recent dependabot PR #309 is great to see, adding the swift package-ecosystem would extend that coverage to yyjson and other SPM dependencies automatically afaik.

The goal is simply to limit the (admittedly rare but non-zero) risk of a supply-chain attack, particularly when importing c code. While unlikely, it's a risk that would need addressing for compliance-conscious importers when a new transitive dependency gets added (huggingface vendored repos excluded since they'd be covered by existing controls).

That said, yyjson itself looks quite reliable, the maintainer is responsive to CVEs GHSA-q4m7-9pcm-fpxh, and it's a small, simple codebase, so it would likely pass an audit even with a single maintainer and limited fuzzing.

@pcuenca
Copy link
Member

pcuenca commented Feb 10, 2026

Thanks a lot @ZachNagengast, that's great feedback!

How do you feel about going for 2 @DePasqualeOrg @mattt? Otherwise we can go for 1.

@DePasqualeOrg
Copy link
Contributor Author

I would lean toward not adding an extra level of dependencies unless there's a strong argument for it.

@pcuenca
Copy link
Member

pcuenca commented Feb 10, 2026

Thanks @DePasqualeOrg! 🙌

@pcuenca pcuenca merged commit 7ec2432 into huggingface:main Feb 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants