Torchao int4 serialization #11591

SunMarc · 2025-05-20T16:41:29Z

What does this PR do?

This PR fixes torchao int4 checkpoint loading. We need to load the checkpoint directly on the device we quantized it. We make the assumption that we are loading the model on the right device at the start.

Needed for this model https://huggingface.co/diffusers/FLUX.1-dev-torchao-int4

HuggingFaceDocBuilderDev · 2025-05-20T16:50:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks! Some questions.

sayakpaul · 2025-05-22T12:40:51Z

src/diffusers/models/modeling_utils.py

+                and hf_quantizer.quantization_config.quant_method == QuantizationMethod.TORCHAO
+                and hf_quantizer.quantization_config.quant_type in ["int4_weight_only", "autoquant"]
+            ):
+                map_location = torch.device([d for d in device_map.values() if d not in ["cpu", "disk"]][0])


Sufficiently safe to say that we would always have a non-None device_map?

Also, what happens if the device_map has multiple CUDA devices specified? Would the indexing make sense there?

Okay for this PR but we could potentially have a resolve_map_location() per quantizer class, maybe.

Sufficiently safe to say that we would always have a non-None device_map?

I check that the device_map is not None. Also this should be safe enough. I took that from transformers. There shouldn't be an issue with the indexing, in any case we will move again the tensors if they are multiple index.

Yeah I can switch to update_map_location.

sayakpaul · 2025-05-22T12:42:47Z

Once the PR is close to merging, let's also add a test.

SunMarc · 2025-05-22T12:49:55Z

Will add a test !

Note that in general, I wouldn't recommend saving int4 models with torchao as this is hardware dependent between cpu and cuda.

sayakpaul · 2025-05-22T13:05:10Z

Note that in general, I wouldn't recommend saving int4 models with torchao as this is hardware dependent between cpu and cuda.

Indeed. Then let's also add a note in the docs

SunMarc added 3 commits May 20, 2025 18:24

load tensors on cuda

a99663d

quick fix

Loading
Loading status checks…

cad4954

style

Loading
Loading status checks…

93c38d2

SunMarc requested a review from sayakpaul May 22, 2025 12:20

sayakpaul reviewed May 22, 2025

View reviewed changes

sayakpaul requested a review from DN6 May 22, 2025 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchao int4 serialization #11591

Torchao int4 serialization #11591

SunMarc commented May 20, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 20, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul May 22, 2025

Uh oh!

SunMarc May 22, 2025

Uh oh!

sayakpaul commented May 22, 2025

Uh oh!

SunMarc commented May 22, 2025

Uh oh!

sayakpaul commented May 22, 2025

Uh oh!

Torchao int4 serialization #11591

Are you sure you want to change the base?

Torchao int4 serialization #11591

Conversation

SunMarc commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 20, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 22, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc May 22, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented May 22, 2025

Uh oh!

SunMarc commented May 22, 2025

Uh oh!

sayakpaul commented May 22, 2025

Uh oh!

SunMarc commented May 20, 2025 •

edited

Loading