Use swift-huggingface to interact with Hub API#297
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
pcuenca
left a comment
There was a problem hiding this comment.
Looks good so far! Haven't had a chance to test it yet.
Do you think we should write a transition guide of some sort? Happy to help with that.
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
08cc81c to
5ac8511
Compare
There was a problem hiding this comment.
Pull request overview
This PR migrates the Hub module from a custom Downloader implementation to using HubClient from the swift-huggingface package. This simplifies the codebase by delegating download functionality to a dedicated, well-maintained library while maintaining backward compatibility with the existing HubApi interface.
Changes:
- Added
swift-huggingfacedependency (v0.7.0+) and integratedHubClientfor file downloads - Removed custom
Downloaderclass andDownloaderTests(533 and 190 lines respectively) - Updated test model references to use canonical namespaced repository IDs (e.g., "google-t5/t5-base" instead of "t5-base")
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| Package.swift | Adds swift-huggingface dependency and links HuggingFace product to Hub target |
| Sources/Hub/Hub.swift | Adds type conversion extensions for mapping between Hub and HuggingFace types |
| Sources/Hub/HubApi.swift | Replaces custom download logic with HubClient, exposes client property, updates imports |
| Sources/Hub/Downloader.swift | Removed - custom download implementation replaced by HubClient |
| Tests/HubTests/DownloaderTests.swift | Removed - test coverage now provided by HubApiTests with HubClient |
| Tests/HubTests/HubApiTests.swift | Updates resume test to work with HubClient's cache structure, reduces validation scope |
| Tests/HubTests/HubTests.swift | Updates model references to canonical namespaced IDs |
| Tests/TokenizersTests/TokenizerTests.swift | Updates model references to canonical namespaced IDs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dc2a7f0 to
30fe942
Compare
… build failures on Linux
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Keep HubClient progress isolated from parent-linked fileProgress to avoid | ||
| // totalUnitCount mutations affecting snapshot's aggregate progress tree. | ||
| let downloadProgress = Progress(totalUnitCount: 1) | ||
| let downloadStartTime = Date() | ||
|
|
||
| // etag is a sha256 | ||
| // => means it's an LFS file (large) | ||
| // => let's compute local hash and compare | ||
| // => if match, update metadata and return file | ||
| if hub.isValidHash(hash: remoteEtag, pattern: hub.sha256Pattern) { | ||
| let fileHash = try hub.computeFileHash(file: destination) | ||
| if fileHash == remoteEtag { | ||
| try hub.writeDownloadMetadata(commitHash: remoteCommitHash, etag: remoteEtag, metadataPath: metadataDestination) | ||
| return destination | ||
| } | ||
| } | ||
| } | ||
| // If the file exists locally but metadata check failed, force a re-download | ||
| // to skip HubClient's cache (which may have the old/wrong version) | ||
| let forceDownload = downloaded | ||
|
|
There was a problem hiding this comment.
When downloading via HubClient (the non-background path), this function never calls progressHandler until the download finishes, and it never maps downloadProgress.fractionCompleted into fileProgress. This is a behavior regression versus the previous per-byte progress updates and will break callers/tests that expect intermediate progress (e.g. cancellation after the first update). Consider observing downloadProgress and forwarding updates into fileProgress + progressHandler, or using a Progress child relationship that doesn’t mutate the parent’s totalUnitCount.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Linux Foundation may not provide String(localized:comment:), so keep call sites portable. | ||
| private extension String { | ||
| init(localized key: String, comment: String? = nil) { | ||
| self = key | ||
| } |
There was a problem hiding this comment.
The Linux String(localized:comment:) shim is duplicated across multiple files. Consider centralizing it into a single #if !canImport(Darwin) helper file (module-wide/internal) and removing per-file copies so updates don't have to be replicated everywhere.
| // Linux: let URLSession handle redirects with default behavior. | ||
| (_, response) = try await URLSession.shared.data(for: request) | ||
| #else | ||
| // Apple platforms: use shared session with custom relative-redirect handling. |
There was a problem hiding this comment.
In the FoundationNetworking (Linux) branch, httpHead uses URLSession.shared default redirect handling. This bypasses the custom redirect policy described above (relative-only redirects) and may allow absolute redirects while still attaching the Authorization header. Consider using the same RedirectDelegate-backed session (or otherwise disabling/validating redirects) on Linux as well, or update the doc comment to accurately describe the platform-specific behavior.
| // Linux: let URLSession handle redirects with default behavior. | |
| (_, response) = try await URLSession.shared.data(for: request) | |
| #else | |
| // Apple platforms: use shared session with custom relative-redirect handling. | |
| // Linux: use URLSession.shared with default redirect handling. This may follow | |
| // absolute redirects while still sending the Authorization header. | |
| (_, response) = try await URLSession.shared.data(for: request) | |
| #else | |
| // Apple platforms: use a RedirectDelegate-backed session that enforces the | |
| // custom redirect policy (for example, allowing only relative redirects). |
| let hubApi = HubApi(downloadBase: downloadDestination) | ||
| var lastProgress: Progress? = nil | ||
| var downloadedTo = FileManager.default.homeDirectoryForCurrentUser | ||
| .appendingPathComponent("Library/Caches/huggingface-tests/models/coreml-projects/Llama-2-7b-chat-coreml") | ||
|
|
||
| let metadataDestination = downloadedTo.appending(component: ".cache/huggingface/download") | ||
|
|
||
| // Get the etag from the file metadata | ||
| let url = URL(string: "https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml/resolve/main/config.json")! | ||
| let etag = try await Hub.getFileMetadata(fileURL: url).etag! | ||
|
|
||
| try FileManager.default.createDirectory(at: metadataDestination, withIntermediateDirectories: true, attributes: nil) | ||
| try "X".write(to: metadataDestination.appendingPathComponent("config.json.\(etag).incomplete"), atomically: true, encoding: .utf8) | ||
| downloadedTo = try await hubApi.snapshot(from: repo, matching: "config.json") { progress in | ||
| // This simulates a previous interrupted download. | ||
| // Use a temporary HF_HOME so we don't write into the real user home directory. | ||
| let tempHFHome = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString) | ||
| try FileManager.default.createDirectory(at: tempHFHome, withIntermediateDirectories: true, attributes: nil) | ||
| let previousHFHome = ProcessInfo.processInfo.environment["HF_HOME"] | ||
| setenv("HF_HOME", tempHFHome.path, 1) |
There was a problem hiding this comment.
This test mutates the process-wide HF_HOME environment variable to control HubClient caching. That can leak across concurrently running tests and can also be order-dependent (the HubApi / HubClient is instantiated before HF_HOME is set). Consider setting HF_HOME before constructing HubApi, and/or adding a test-only way to inject a cache root into HubApi/HubClient so tests don't rely on global environment mutation.
| let state = BackgroundDownloadState() | ||
| let delegate = BackgroundDownloadDelegate( | ||
| state: state, | ||
| expectedSize: expectedSize, | ||
| fileProgress: fileProgress, | ||
| parentProgress: parentProgress, | ||
| progressHandler: progressHandler |
There was a problem hiding this comment.
progressHandler is called with the file marked as complete before writeDownloadMetadata(...) runs. If writing metadata fails, callers can observe 100% progress followed by an error, and offline mode metadata may be missing even though the UI saw completion. Consider writing metadata first (or only reporting completion after metadata write succeeds) so progress reflects the final committed state.
| let state = BackgroundDownloadState() | |
| let delegate = BackgroundDownloadDelegate( | |
| state: state, | |
| expectedSize: expectedSize, | |
| fileProgress: fileProgress, | |
| parentProgress: parentProgress, | |
| progressHandler: progressHandler | |
| // Write metadata for offline mode support before reporting completion | |
| try writeDownloadMetadata(commitHash: remoteCommitHash, etag: remoteEtag, metadataPath: metadataDestination) | |
| // When a file is re-downloaded, ensure the local file timestamp reflects this write. | |
| try? FileManager.default.setAttributes([.modificationDate: Date()], ofItemAtPath: destination.path) | |
| fileProgress.completedUnitCount = fileProgress.totalUnitCount | |
| progressHandler(parentProgress) |
|
Attempting to rebase and reconcile broken CI for Linux created too much noise for this to be legible. Closing in favor of #315. |
This PR updates
HubApito useHubClientfrom the swift-huggingface package.