Use yyjson for significantly faster JSON parsing#304
Use yyjson for significantly faster JSON parsing#304pcuenca merged 12 commits intohuggingface:mainfrom
Conversation
| guard let parsed = try? JSONSerialization.bomPreservingJsonObject(with: data) else { | ||
| throw Hub.HubClientError.jsonSerialization(fileURL: fileURL, message: "JSON Serialization failed for \(fileURL). Please verify that you have set the HF_TOKEN environment variable.") | ||
| do { | ||
| return try YYJSONParser.parseToConfig(data) | ||
| } catch { | ||
| throw Hub.HubClientError.jsonSerialization( | ||
| fileURL: fileURL, | ||
| message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set." | ||
| ) | ||
| } | ||
| guard let dictionary = parsed as? [NSString: Any] else { throw Hub.HubClientError.parse } | ||
| return Config(dictionary) | ||
| } | ||
| } |
There was a problem hiding this comment.
2c on this:
I think theres an opportunity to protocolize json parsing, which would allow the dependency footprint to be reduced for this specific project but still enable yyjson usage outside of it.
protocol JSONParser {
func parseToConfig(_ data: Data) throws -> Config
}Then
func configuration(fileURL: URL, parser: JSONParser = DefaultJSONParser()) throws -> Config {
let data = try Data(contentsOf: fileURL)
do {
return try parser.parseToConfig(data)
} catch {
throw Hub.HubClientError.jsonSerialization(
fileURL: fileURL,
message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set."
)
}
}Then JSONParser could be passed to the HubApi init or an object that is passed into configuration call.
let customParser = YYJSONParser()
let config = try hubApi.configuration(fileURL: someURL, parser: customParser)Ideally this project would remain pure swift w/ swift dependencies but still allow fast implementations via protocols.
There was a problem hiding this comment.
That's a nice idea, although the Python transformers library uses the Rust tokenizers library, which uses serde for JSON parsing. I think there is a good argument for just having a fast default like in the Python transformers, especially since what's available in Swift is so slow. People running MLX models in Swift are already using C++ libraries through C bridging. yyjson is in C, so Swift can call it directly with minimal overhead.
There was a problem hiding this comment.
@DePasqualeOrg Amazing work! I just opened a PR demonstrating the effect of in-situ parsing on speed and memory here: DePasqualeOrg#2
@ZachNagengast I'm sympathetic to the idea of dependency injection, but in this case, it's hard to imagine a scenario in which an API consumer wouldn't opt-in to faster JSON parsing. Assuming the performance is consistently better, and barring segfaults or incorrect behavior, then this seems like a slam dunk.
If the additional dependency is a concern, I suppose we could compromise with a trait that's enabled by default and could be disabled on an opt-out basis.
There was a problem hiding this comment.
Fast default would be great, on the other hand swift apps have the consideration of compilation time and distributable binary size that also should be optimized. Testing the build on this branch appears to add 1.2MB of C code which compresses well to be fair to around 113KB. Do you think this dependency can be transitioned via the protocol to the MLX repo since that is already compiling C code?
There was a problem hiding this comment.
Posted before reading your comment, the extra dependency is a concern but it could be isolated with traits or simple compiler flags checking for canImport(yyjson) similar to this WIP branch that pulls jinja out of the compilation: main...ZachNagengast:swift-transformers:optional-jinja-import-for-hub-and-tokenizers
Something like this would allow the Transformers library to import the fast solution by default, but more targeted implementations that just want Hub and Tokenizers could have an optimal dependency footprint
|
Thanks for this, @mattt. I dug into it, and it looks like both methods use identical memory (~68 MB) when measured in separate tests. The 0 KB measurement may have been due to memory reuse between sequential tests. Let me know what you think: https://github.com/DePasqualeOrg/swift-transformers/tree/benchmark-memory-use |
|
@DePasqualeOrg Running my own benchmarks, I found that YYJSON is actually ~8.7x faster than Foundation for parsing that ~10MB
And according to Swift Benchmark, in-situ parsing correctly showed 0 allocations. All the more reason for us to move forward, in my opinion. @pcuenca Any strong feelings about how to proceed? |
|
@mattt, I don't fully understand the implications of in-situ parsing, but I'm not sure there's a benefit. Here's the analysis from Claude Code, for the record:
|
|
Sorry for the delay. Tokenizer loading, unlike Jinja templates, is a core functionality that most clients of this library use. In my opinion it'd be useful to have the fast path enabled by default. Delegating this integration to a particular downstream project would negate the benefits for others. On the use of a trait to opt out, I think it adds some maintenance burden to this project so I'd personally prefer not to do it. I'd be happy if we can remove the BOM handling workaround and provide a single and clear path for users of this code. When weighing the quantitative differences (slightly longer compilation times, 113KB of additional size) against clarity, I lean towards the latter. So using the trait boils down, in my opinion, to the qualitative consequences of declaring a dependency from a new origin. If this is intractable for you @ZachNagengast, then we could consider it. Would a |
|
(I like the elegance of the in-situ approach but I think we can defer that decision to a new PR). |
There was a problem hiding this comment.
Will this run on every test pass, including CI? If so, I'd place it in a separate folder that is not triggered by default.
There was a problem hiding this comment.
The benchmarks only run when explicitly enabled:
run with
RUN_BENCHMARKS=1 swift test --filter Benchmarks
There was a problem hiding this comment.
Ah, I missed the envvar yes. Technically, it would also run with RUN_BENCHMARKS=0, but it's ok as long as it's not the default.
There was a problem hiding this comment.
Good point. With the latest commit, they will only run with with RUN_BENCHMARKS=1.
|
@pcuenca Your points make sense to me. It also looks like @mattt has some good progress here https://github.com/mattt/swift-yyjson, which appears to bundle a version of the yyjson c code internally. In terms of supply chain, it would be great to have stronger pinning or isolation of yyjson like this. Here are a few options, roughly in order of effort:
The recent dependabot PR #309 is great to see, adding the The goal is simply to limit the (admittedly rare but non-zero) risk of a supply-chain attack, particularly when importing c code. While unlikely, it's a risk that would need addressing for compliance-conscious importers when a new transitive dependency gets added (huggingface vendored repos excluded since they'd be covered by existing controls). That said, yyjson itself looks quite reliable, the maintainer is responsive to CVEs GHSA-q4m7-9pcm-fpxh, and it's a small, simple codebase, so it would likely pass an audit even with a single maintainer and limited fuzzing. |
|
Thanks a lot @ZachNagengast, that's great feedback! How do you feel about going for |
|
I would lean toward not adding an extra level of dependencies unless there's a strong argument for it. |
|
Thanks @DePasqualeOrg! 🙌 |
JSON parsing is one of the biggest performance bottlenecks for tokenizer loading, and yyjson, a high-performance C library, offers significant speed gains for large tokenizer files: it's 3.4x faster for raw JSON parsing and 2.1x faster for building the
Config, saving around 600 ms in a typical tokenizer load.Changes
YYJSONParserwith direct yyjson →Configconversion (no intermediate Foundation objects)HubApi.configuration(fileURL:)to use yyjsonJSONSerialization+BOM.swift(yyjson handles BOM correctly)Benchmarkstest target (run withRUN_BENCHMARKS=1 swift test --filter Benchmarks)Performance
Tested with the 11.4 MB
tokenizer.jsonfrommlx-community/Qwen3-0.6B-Base-DQ5:This saves ~600 ms per tokenizer load on an M3 MacBook Pro.
All existing tests pass.