Skip to content

Align download and caching with huggingface_hub for improved performance#21

Open
DePasqualeOrg wants to merge 34 commits into
huggingface:mainfrom
DePasqualeOrg:optimizations
Open

Align download and caching with huggingface_hub for improved performance#21
DePasqualeOrg wants to merge 34 commits into
huggingface:mainfrom
DePasqualeOrg:optimizations

Conversation

@DePasqualeOrg

@DePasqualeOrg DePasqualeOrg commented Dec 26, 2025

Copy link
Copy Markdown

This PR includes significant improvements to download and cache performance, aligning swift-huggingface more closely with the Python huggingface_hub library. Several of the problems it solves originated in design decisions that diverged from the Python library.

Benchmark Results

I've added a separate benchmarks test target that will not run in CI and can be run with RUN_BENCHMARKS=1 swift test --filter Benchmarks.

Tested with mlx-community/Qwen3-0.6B-Base-DQ5 (~11 MB tokenizer.json).

Check out commit 84759be to run the benchmarks before changes on this branch.

Benchmark Before After Improvement
Cached file retrieval 678.5 ms 0.7 ms ~969x faster
Fresh download 2749.3 ms 1070.7ms ~2.6x faster

Cached file retrieval: Previously, every call copied files from the cache to a destination directory, even when nothing had changed. Now we return the snapshot cache path directly with no copy step. When the revision is a commit hash (as in this benchmark), we also skip the API call entirely and return immediately.

Fresh download: Previously, files were downloaded sequentially. Now they download concurrently via a task group (default concurrency: 8), and results are written directly to the cache with no extra copy step.

Why this is a single PR

These changes form an interconnected rewrite of the download/cache subsystem. Parallel downloads, resume, cache-path return, progress, file locking, and offline mode all modify the same core functions (downloadSnapshot, downloadFile, downloadToCache) and depend on each other.

Changes

1. Return snapshot cache path instead of copying files

downloadSnapshot and downloadFile now return the snapshot cache path directly (containing symlinks to blobs), matching Python's snapshot_download() default behavior. The previous design always copied every file from the cache to a separate destination directory, which caused redundant disk I/O and duplication. An optional to destination: parameter (matching Python's local_dir) is available for callers that need files in a specific location.

Python equivalent: _snapshot_download.py:462-465

2. Skip API calls for cached files

When the revision is a commit hash (immutable), the API response is cached as <commit>.json in a separate .metadata directory at the cache root (mirroring how .locks is structured) after the first download. This keeps the snapshot directory clean so it only contains files from the repository. On subsequent calls, this cached response is used to verify that all files matching the requested globs are present in the snapshot. If all files are present, the snapshot is returned immediately with no API call. If any files are missing (e.g., from an interrupted download or different glob patterns), the fast path is skipped and a fresh download proceeds.

Python's snapshot_download acknowledges this limitation in its offline path: "we can't check if all the files are actually there." Since commit hashes are immutable, caching the API response is safe and allows per-file verification without a network round-trip. This improvement could also be added to the Python library.

Python equivalent: file_download.py:1082-1095

3. Parallel file downloads

Files are now downloaded concurrently using a task group with configurable concurrency (default: 8).

Python equivalent: _snapshot_download.py:449-455

4. Size-weighted download progress

Progress is weighted by file size instead of file count, providing accurate progress bars for downloads containing a mix of small config files and large model weights.

5. Automatic resume for interrupted downloads

Downloads automatically resume from where they left off using HTTP Range headers with a cache-first approach matching huggingface_hub:

  1. Check for cache/blobs/{etag}.incomplete and its size
  2. Send Range header to download only the remaining bytes
  3. Append the remainder to the incomplete file, then rename to cache/blobs/{etag}
  4. Create symlink in snapshots/{commit}/{filename}

Uses URLSession.download(for:delegate:) for efficient OS-level streaming to disk. For resume, the downloaded remainder is appended to the incomplete file via chunked FileHandle copy. DownloadProgressDelegate accounts for the resume offset so that progress bars report accurate totals.

This enables cross-client resume -- if a download starts in Python and gets interrupted, Swift can resume it (and vice versa), since incomplete files are stored in the same cache location with the same naming convention.

Python equivalent: file_download.py:1850-1855 (incomplete file handling), file_download.py:403-404 (Range header)

6. File locking

Concurrent downloads to the same blob are serialized using swift-filelock, a port of Python's filelock, which correctly handles contention across multiple lock instances for the same path (this was not the case with the file lock implementation that was previously included).

Python equivalent: file_download.py:1239-1251

7. Offline mode and cache fallback

Added a localFilesOnly parameter on both downloadSnapshot and downloadFile (matching Python's local_files_only), a useOfflineMode parameter, and automatic network detection via NetworkMonitor. When any of these are active, cached files are returned without making network requests.

Additionally, if the API call to fetch repo info fails (network error, server outage, etc.), downloadSnapshot falls back to the local cache before re-throwing the error. This matches Python's try/except pattern that catches errors during repo_info() and retries with local_files_only=True.

Python equivalent: _snapshot_download.py:234-330

8. Xet storage compatibility

Added fetchFileMetadata to capture X-Linked-Etag and X-Repo-Commit headers before CDN redirect. Uses same-host redirect handling matching huggingface_hub's _httpx_follow_relative_redirects.

downloadSnapshot uses the Git.TreeEntry overload for size-based Xet transport selection, so small files use LFS transport directly and skip the unnecessary HEAD request to check for Xet support.

9. Linux support

Linux has full feature parity for caching (blob checks, file locking, cache structure) but lacks resume support due to API limitations. Fallback paths are included throughout.

10. Make HubCache required

Changed cache: HubCache? to cache: HubCache across all HubClient initializers. The cache-first download architecture (blob storage, symlinks, resume) inherently requires a cache, and this aligns with huggingface_hub which also always requires a cache directory.

11. Dead code cleanup

Removed FileProgressReporter and ProgressObservation from the Xet PR, which became unused after downloadSnapshot was replaced with a parallel implementation using Foundation's Progress parent-child hierarchy. Also removed copyCachedFiles, filesSameSize, the destination fast path, and the deprecated resumeDownloadFile (which used opaque URLSession resume data, now replaced by automatic Range header resume). Fixed the snapshot progress tests to mock the correct API endpoint (getRepoInfo instead of listFiles).

Tests

Added comprehensive test suite in SnapshotDownloadTests.swift covering cache, incomplete snapshot detection, offline mode, cache fallback on network errors, resume, 416 recovery, file locking, and concurrent downloads. Path traversal protection tested with 12 cases.

Future work and alignment with Python

There are still many aspects in which the Swift client does not match the behavior or design of the Python client, which can result in critical issues. I won't address these in this PR, but in general I recommend closely following the Python implementation rather than trying out different designs, as the Python client has already solved many issues that different designs can introduce.

@DePasqualeOrg DePasqualeOrg changed the title Add download optimizations and offline mode support Download/cache optimizations and offline mode support Dec 26, 2025
@DePasqualeOrg DePasqualeOrg changed the title Download/cache optimizations and offline mode support Optimize download and cache performance, add offline mode support Dec 27, 2025
@DePasqualeOrg DePasqualeOrg changed the title Optimize download and cache performance, add offline mode support Optimize download and cache; add resumable, parallel downloads and offline mode Dec 27, 2025
@DePasqualeOrg DePasqualeOrg changed the title Optimize download and cache; add resumable, parallel downloads and offline mode Improve download and cache performance; add resumable, parallel downloads and offline mode Dec 27, 2025
@DePasqualeOrg DePasqualeOrg marked this pull request as draft January 5, 2026 12:50
@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review January 5, 2026 19:22
@DePasqualeOrg DePasqualeOrg force-pushed the optimizations branch 2 times, most recently from dc4029b to 7f3a9e9 Compare February 21, 2026 17:22
The downloadSnapshot implementation calls getRepoInfo (model info API)
rather than listFiles (tree API). Update the mock responses to return
model info JSON with sha and siblings instead of a tree entry array.
The Xet PR added a downloadFile(Git.TreeEntry) overload that skips
unnecessary HEAD requests for small files. downloadSnapshot was
bypassing this by using its own FileEntry type and calling the
path-based overload. Replace FileEntry with Git.TreeEntry so
downloadSnapshot uses the same size-based transport selection.
Align with Python's huggingface_hub: downloadSnapshot and downloadFile
now return the snapshot cache path (containing symlinks to blobs)
instead of copying files to a separate destination. This eliminates
redundant file copies and disk duplication.

Remove copyCachedFiles, downloadToDestinationWithoutCache, filesSameSize,
destination fast path, and the redundant downloadContentsOfFile -> URL
overload. Rename copyBlobToDestination to createCacheEntries (symlink
only). Require a cache to be configured (throw cacheNotConfigured
otherwise). Propagate the main session configuration to metadataSession
so mock tests can intercept metadata HEAD requests.
Match huggingface_hub's try/except pattern: when getRepoInfo fails
(network error, server 500, timeout), try serving from the local
cache before re-throwing the error. This makes downloadSnapshot
resilient to transient network issues when files are already cached.
Matches Python's local_files_only parameter. When true, returns cached
files without making any network requests, resolving branch names via
the local refs file.
@DePasqualeOrg DePasqualeOrg changed the title Improve download and cache performance; add resumable, parallel downloads and offline mode Align download and caching with huggingface_hub for improved performance Feb 21, 2026
Adds optional `to destination:` parameter to downloadSnapshot and
downloadFile, matching Python's local_dir. When set, files are copied
from the cache to the specified directory. Also adds localFilesOnly to
downloadFile for consistency with downloadSnapshot.
Python's snapshot_download acknowledges it can't check if all files
are present when returning a cached snapshot. We improve on this by
caching the repo info response after the first download and verifying
each file's presence on subsequent calls. This detects incomplete
snapshots caused by interrupted downloads or different glob patterns.
The cache-first download architecture (blob storage, symlinks, resume
support) inherently requires a cache. This aligns with huggingface_hub,
which also always requires a cache directory.
Replace the byte-by-byte AsyncBytes iteration with URLSession.download(),
which handles streaming to disk at the OS level. Resume support is
preserved: when an .incomplete file exists, a Range header is sent and the
downloaded remainder is appended via chunked FileHandle copy.

Also fixes progress reporting during resume by accounting for the resume
offset in DownloadProgressDelegate.

Remove deprecated resumeDownloadFile, which used opaque URLSession resume
data. The new downloadFile handles resume automatically via Range headers.
@DePasqualeOrg

DePasqualeOrg commented Feb 24, 2026

Copy link
Copy Markdown
Author

@mattt, it looks like some of your commits in this repo from the past two days are derived from my work in this PR as well as my swift-filelock package, but you did not engage with this PR or credit me.

Cc @pcuenca @LysandreJik @julien-c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant