Align download and caching with huggingface_hub for improved performance#21
Open
DePasqualeOrg wants to merge 34 commits into
Open
Align download and caching with huggingface_hub for improved performance#21DePasqualeOrg wants to merge 34 commits into
DePasqualeOrg wants to merge 34 commits into
Conversation
8cb1eb3 to
49ff955
Compare
This was referenced Dec 27, 2025
49ff955 to
007d780
Compare
d4a7c1a to
ce5642b
Compare
a427698 to
e8c01a2
Compare
dc4029b to
7f3a9e9
Compare
The downloadSnapshot implementation calls getRepoInfo (model info API) rather than listFiles (tree API). Update the mock responses to return model info JSON with sha and siblings instead of a tree entry array.
The Xet PR added a downloadFile(Git.TreeEntry) overload that skips unnecessary HEAD requests for small files. downloadSnapshot was bypassing this by using its own FileEntry type and calling the path-based overload. Replace FileEntry with Git.TreeEntry so downloadSnapshot uses the same size-based transport selection.
Align with Python's huggingface_hub: downloadSnapshot and downloadFile now return the snapshot cache path (containing symlinks to blobs) instead of copying files to a separate destination. This eliminates redundant file copies and disk duplication. Remove copyCachedFiles, downloadToDestinationWithoutCache, filesSameSize, destination fast path, and the redundant downloadContentsOfFile -> URL overload. Rename copyBlobToDestination to createCacheEntries (symlink only). Require a cache to be configured (throw cacheNotConfigured otherwise). Propagate the main session configuration to metadataSession so mock tests can intercept metadata HEAD requests.
Match huggingface_hub's try/except pattern: when getRepoInfo fails (network error, server 500, timeout), try serving from the local cache before re-throwing the error. This makes downloadSnapshot resilient to transient network issues when files are already cached.
Matches Python's local_files_only parameter. When true, returns cached files without making any network requests, resolving branch names via the local refs file.
7f3a9e9 to
f2d301c
Compare
Adds optional `to destination:` parameter to downloadSnapshot and downloadFile, matching Python's local_dir. When set, files are copied from the cache to the specified directory. Also adds localFilesOnly to downloadFile for consistency with downloadSnapshot.
Python's snapshot_download acknowledges it can't check if all files are present when returning a cached snapshot. We improve on this by caching the repo info response after the first download and verifying each file's presence on subsequent calls. This detects incomplete snapshots caused by interrupted downloads or different glob patterns.
The cache-first download architecture (blob storage, symlinks, resume support) inherently requires a cache. This aligns with huggingface_hub, which also always requires a cache directory.
Replace the byte-by-byte AsyncBytes iteration with URLSession.download(), which handles streaming to disk at the OS level. Resume support is preserved: when an .incomplete file exists, a Range header is sent and the downloaded remainder is appended via chunked FileHandle copy. Also fixes progress reporting during resume by accounting for the resume offset in DownloadProgressDelegate. Remove deprecated resumeDownloadFile, which used opaque URLSession resume data. The new downloadFile handles resume automatically via Range headers.
f2d301c to
b983324
Compare
Author
|
@mattt, it looks like some of your commits in this repo from the past two days are derived from my work in this PR as well as my swift-filelock package, but you did not engage with this PR or credit me. |
This was referenced Feb 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes significant improvements to download and cache performance, aligning swift-huggingface more closely with the Python huggingface_hub library. Several of the problems it solves originated in design decisions that diverged from the Python library.
Benchmark Results
I've added a separate benchmarks test target that will not run in CI and can be run with
RUN_BENCHMARKS=1 swift test --filter Benchmarks.Tested with
mlx-community/Qwen3-0.6B-Base-DQ5(~11 MB tokenizer.json).Check out commit
84759beto run the benchmarks before changes on this branch.Cached file retrieval: Previously, every call copied files from the cache to a destination directory, even when nothing had changed. Now we return the snapshot cache path directly with no copy step. When the revision is a commit hash (as in this benchmark), we also skip the API call entirely and return immediately.
Fresh download: Previously, files were downloaded sequentially. Now they download concurrently via a task group (default concurrency: 8), and results are written directly to the cache with no extra copy step.
Why this is a single PR
These changes form an interconnected rewrite of the download/cache subsystem. Parallel downloads, resume, cache-path return, progress, file locking, and offline mode all modify the same core functions (
downloadSnapshot,downloadFile,downloadToCache) and depend on each other.Changes
1. Return snapshot cache path instead of copying files
downloadSnapshotanddownloadFilenow return the snapshot cache path directly (containing symlinks to blobs), matching Python'ssnapshot_download()default behavior. The previous design always copied every file from the cache to a separate destination directory, which caused redundant disk I/O and duplication. An optionalto destination:parameter (matching Python'slocal_dir) is available for callers that need files in a specific location.Python equivalent:
_snapshot_download.py:462-4652. Skip API calls for cached files
When the revision is a commit hash (immutable), the API response is cached as
<commit>.jsonin a separate.metadatadirectory at the cache root (mirroring how.locksis structured) after the first download. This keeps the snapshot directory clean so it only contains files from the repository. On subsequent calls, this cached response is used to verify that all files matching the requested globs are present in the snapshot. If all files are present, the snapshot is returned immediately with no API call. If any files are missing (e.g., from an interrupted download or different glob patterns), the fast path is skipped and a fresh download proceeds.Python's
snapshot_downloadacknowledges this limitation in its offline path: "we can't check if all the files are actually there." Since commit hashes are immutable, caching the API response is safe and allows per-file verification without a network round-trip. This improvement could also be added to the Python library.Python equivalent:
file_download.py:1082-10953. Parallel file downloads
Files are now downloaded concurrently using a task group with configurable concurrency (default: 8).
Python equivalent:
_snapshot_download.py:449-4554. Size-weighted download progress
Progress is weighted by file size instead of file count, providing accurate progress bars for downloads containing a mix of small config files and large model weights.
5. Automatic resume for interrupted downloads
Downloads automatically resume from where they left off using HTTP Range headers with a cache-first approach matching huggingface_hub:
cache/blobs/{etag}.incompleteand its sizecache/blobs/{etag}snapshots/{commit}/{filename}Uses
URLSession.download(for:delegate:)for efficient OS-level streaming to disk. For resume, the downloaded remainder is appended to the incomplete file via chunkedFileHandlecopy.DownloadProgressDelegateaccounts for the resume offset so that progress bars report accurate totals.This enables cross-client resume -- if a download starts in Python and gets interrupted, Swift can resume it (and vice versa), since incomplete files are stored in the same cache location with the same naming convention.
Python equivalent:
file_download.py:1850-1855(incomplete file handling),file_download.py:403-404(Range header)6. File locking
Concurrent downloads to the same blob are serialized using swift-filelock, a port of Python's filelock, which correctly handles contention across multiple lock instances for the same path (this was not the case with the file lock implementation that was previously included).
Python equivalent:
file_download.py:1239-12517. Offline mode and cache fallback
Added a
localFilesOnlyparameter on bothdownloadSnapshotanddownloadFile(matching Python'slocal_files_only), auseOfflineModeparameter, and automatic network detection viaNetworkMonitor. When any of these are active, cached files are returned without making network requests.Additionally, if the API call to fetch repo info fails (network error, server outage, etc.),
downloadSnapshotfalls back to the local cache before re-throwing the error. This matches Python's try/except pattern that catches errors duringrepo_info()and retries withlocal_files_only=True.Python equivalent:
_snapshot_download.py:234-3308. Xet storage compatibility
Added
fetchFileMetadatato captureX-Linked-EtagandX-Repo-Commitheaders before CDN redirect. Uses same-host redirect handling matching huggingface_hub's_httpx_follow_relative_redirects.downloadSnapshotuses theGit.TreeEntryoverload for size-based Xet transport selection, so small files use LFS transport directly and skip the unnecessary HEAD request to check for Xet support.9. Linux support
Linux has full feature parity for caching (blob checks, file locking, cache structure) but lacks resume support due to API limitations. Fallback paths are included throughout.
10. Make HubCache required
Changed
cache: HubCache?tocache: HubCacheacross allHubClientinitializers. The cache-first download architecture (blob storage, symlinks, resume) inherently requires a cache, and this aligns with huggingface_hub which also always requires a cache directory.11. Dead code cleanup
Removed
FileProgressReporterandProgressObservationfrom the Xet PR, which became unused afterdownloadSnapshotwas replaced with a parallel implementation using Foundation'sProgressparent-child hierarchy. Also removedcopyCachedFiles,filesSameSize, the destination fast path, and the deprecatedresumeDownloadFile(which used opaque URLSession resume data, now replaced by automatic Range header resume). Fixed the snapshot progress tests to mock the correct API endpoint (getRepoInfoinstead oflistFiles).Tests
Added comprehensive test suite in
SnapshotDownloadTests.swiftcovering cache, incomplete snapshot detection, offline mode, cache fallback on network errors, resume, 416 recovery, file locking, and concurrent downloads. Path traversal protection tested with 12 cases.Future work and alignment with Python
There are still many aspects in which the Swift client does not match the behavior or design of the Python client, which can result in critical issues. I won't address these in this PR, but in general I recommend closely following the Python implementation rather than trying out different designs, as the Python client has already solved many issues that different designs can introduce.