Skip to content

Use swift-huggingface to interact with Hub API#297

Closed
mattt wants to merge 29 commits intomainfrom
mattt/hubclient
Closed

Use swift-huggingface to interact with Hub API#297
mattt wants to merge 29 commits intomainfrom
mattt/hubclient

Conversation

@mattt
Copy link
Collaborator

@mattt mattt commented Dec 5, 2025

This PR updates HubApi to use HubClient from the swift-huggingface package.

@ronaldmannak

This comment was marked as resolved.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Haven't had a chance to test it yet.

Do you think we should write a transition guide of some sort? Happy to help with that.

@mattt mattt marked this pull request as ready for review February 19, 2026 11:08
@mattt mattt requested a review from Copilot February 19, 2026 11:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Hub module from a custom Downloader implementation to using HubClient from the swift-huggingface package. This simplifies the codebase by delegating download functionality to a dedicated, well-maintained library while maintaining backward compatibility with the existing HubApi interface.

Changes:

  • Added swift-huggingface dependency (v0.7.0+) and integrated HubClient for file downloads
  • Removed custom Downloader class and DownloaderTests (533 and 190 lines respectively)
  • Updated test model references to use canonical namespaced repository IDs (e.g., "google-t5/t5-base" instead of "t5-base")

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
Package.swift Adds swift-huggingface dependency and links HuggingFace product to Hub target
Sources/Hub/Hub.swift Adds type conversion extensions for mapping between Hub and HuggingFace types
Sources/Hub/HubApi.swift Replaces custom download logic with HubClient, exposes client property, updates imports
Sources/Hub/Downloader.swift Removed - custom download implementation replaced by HubClient
Tests/HubTests/DownloaderTests.swift Removed - test coverage now provided by HubApiTests with HubClient
Tests/HubTests/HubApiTests.swift Updates resume test to work with HubClient's cache structure, reduces validation scope
Tests/HubTests/HubTests.swift Updates model references to canonical namespaced IDs
Tests/TokenizersTests/TokenizerTests.swift Updates model references to canonical namespaced IDs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +587 to 595
// Keep HubClient progress isolated from parent-linked fileProgress to avoid
// totalUnitCount mutations affecting snapshot's aggregate progress tree.
let downloadProgress = Progress(totalUnitCount: 1)
let downloadStartTime = Date()

// etag is a sha256
// => means it's an LFS file (large)
// => let's compute local hash and compare
// => if match, update metadata and return file
if hub.isValidHash(hash: remoteEtag, pattern: hub.sha256Pattern) {
let fileHash = try hub.computeFileHash(file: destination)
if fileHash == remoteEtag {
try hub.writeDownloadMetadata(commitHash: remoteCommitHash, etag: remoteEtag, metadataPath: metadataDestination)
return destination
}
}
}
// If the file exists locally but metadata check failed, force a re-download
// to skip HubClient's cache (which may have the old/wrong version)
let forceDownload = downloaded

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When downloading via HubClient (the non-background path), this function never calls progressHandler until the download finishes, and it never maps downloadProgress.fractionCompleted into fileProgress. This is a behavior regression versus the previous per-byte progress updates and will break callers/tests that expect intermediate progress (e.g. cancellation after the first update). Consider observing downloadProgress and forwarding updates into fileProgress + progressHandler, or using a Progress child relationship that doesn’t mutate the parent’s totalUnitCount.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +585 to +589
// Linux Foundation may not provide String(localized:comment:), so keep call sites portable.
private extension String {
init(localized key: String, comment: String? = nil) {
self = key
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux String(localized:comment:) shim is duplicated across multiple files. Consider centralizing it into a single #if !canImport(Darwin) helper file (module-wide/internal) and removing per-file copies so updates don't have to be replicated everywhere.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +284
// Linux: let URLSession handle redirects with default behavior.
(_, response) = try await URLSession.shared.data(for: request)
#else
// Apple platforms: use shared session with custom relative-redirect handling.
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FoundationNetworking (Linux) branch, httpHead uses URLSession.shared default redirect handling. This bypasses the custom redirect policy described above (relative-only redirects) and may allow absolute redirects while still attaching the Authorization header. Consider using the same RedirectDelegate-backed session (or otherwise disabling/validating redirects) on Linux as well, or update the doc comment to accurately describe the platform-specific behavior.

Suggested change
// Linux: let URLSession handle redirects with default behavior.
(_, response) = try await URLSession.shared.data(for: request)
#else
// Apple platforms: use shared session with custom relative-redirect handling.
// Linux: use URLSession.shared with default redirect handling. This may follow
// absolute redirects while still sending the Authorization header.
(_, response) = try await URLSession.shared.data(for: request)
#else
// Apple platforms: use a RedirectDelegate-backed session that enforces the
// custom redirect policy (for example, allowing only relative redirects).

Copilot uses AI. Check for mistakes.
Comment on lines 1058 to +1070
let hubApi = HubApi(downloadBase: downloadDestination)
var lastProgress: Progress? = nil
var downloadedTo = FileManager.default.homeDirectoryForCurrentUser
.appendingPathComponent("Library/Caches/huggingface-tests/models/coreml-projects/Llama-2-7b-chat-coreml")

let metadataDestination = downloadedTo.appending(component: ".cache/huggingface/download")

// Get the etag from the file metadata
let url = URL(string: "https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml/resolve/main/config.json")!
let etag = try await Hub.getFileMetadata(fileURL: url).etag!

try FileManager.default.createDirectory(at: metadataDestination, withIntermediateDirectories: true, attributes: nil)
try "X".write(to: metadataDestination.appendingPathComponent("config.json.\(etag).incomplete"), atomically: true, encoding: .utf8)
downloadedTo = try await hubApi.snapshot(from: repo, matching: "config.json") { progress in
// This simulates a previous interrupted download.
// Use a temporary HF_HOME so we don't write into the real user home directory.
let tempHFHome = FileManager.default.temporaryDirectory.appendingPathComponent(UUID().uuidString)
try FileManager.default.createDirectory(at: tempHFHome, withIntermediateDirectories: true, attributes: nil)
let previousHFHome = ProcessInfo.processInfo.environment["HF_HOME"]
setenv("HF_HOME", tempHFHome.path, 1)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test mutates the process-wide HF_HOME environment variable to control HubClient caching. That can leak across concurrently running tests and can also be order-dependent (the HubApi / HubClient is instantiated before HF_HOME is set). Consider setting HF_HOME before constructing HubApi, and/or adding a test-only way to inject a cache root into HubApi/HubClient so tests don't rely on global environment mutation.

Copilot uses AI. Check for mistakes.
Comment on lines +677 to +683
let state = BackgroundDownloadState()
let delegate = BackgroundDownloadDelegate(
state: state,
expectedSize: expectedSize,
fileProgress: fileProgress,
parentProgress: parentProgress,
progressHandler: progressHandler
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progressHandler is called with the file marked as complete before writeDownloadMetadata(...) runs. If writing metadata fails, callers can observe 100% progress followed by an error, and offline mode metadata may be missing even though the UI saw completion. Consider writing metadata first (or only reporting completion after metadata write succeeds) so progress reflects the final committed state.

Suggested change
let state = BackgroundDownloadState()
let delegate = BackgroundDownloadDelegate(
state: state,
expectedSize: expectedSize,
fileProgress: fileProgress,
parentProgress: parentProgress,
progressHandler: progressHandler
// Write metadata for offline mode support before reporting completion
try writeDownloadMetadata(commitHash: remoteCommitHash, etag: remoteEtag, metadataPath: metadataDestination)
// When a file is re-downloaded, ensure the local file timestamp reflects this write.
try? FileManager.default.setAttributes([.modificationDate: Date()], ofItemAtPath: destination.path)
fileProgress.completedUnitCount = fileProgress.totalUnitCount
progressHandler(parentProgress)

Copilot uses AI. Check for mistakes.
@mattt
Copy link
Collaborator Author

mattt commented Feb 22, 2026

Attempting to rebase and reconcile broken CI for Linux created too much noise for this to be legible. Closing in favor of #315.

@mattt mattt closed this Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants