Closed
Description
Describe the bug
I am trying to use this FLUX model
https://huggingface.co/lodestones/Chroma
Chroma is a 8.9 billion parameter rectified flow transformer capable of generating images from text descriptions. Based on FLUX.1 [schnell] with heavy architectural modifications.
GGUF version is posted here
https://huggingface.co/silveroxides/Chroma-GGUF/tree/main/chroma-unlocked-v11
Reproduction
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from diffusers import GGUFQuantizationConfig
bfl_repo = "black-forest-labs/FLUX.1-schnell"
dtype = torch.bfloat16
URL = "https://huggingface.co/silveroxides/Chroma-GGUF/blob/main/"
gguf_file = "chroma-unlocked-v11/chroma-unlocked-v11-Q5_0.gguf"
transformer_path = f"{URL}{gguf_file}"
transformer = FluxTransformer2DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=dtype),
torch_dtype=dtype,
)
pipe = FluxPipeline.from_pretrained(
bfl_repo,
transformer=transformer,
torch_dtype=dtype,
)
pipe.enable_model_cpu_offload()
# pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
inference_params = {
"prompt": "A cat",
"height": 512,
"width": 512,
"guidance_scale": 7.5,
"num_inference_steps": 20,
"generator": torch.Generator(device="cpu").manual_seed(0),
}
image = pipe(**inference_params).images[0]
image.save("chroma.png")
Logs
(venv) C:\aiOWN\diffuser_webui>python FLUX_SCHNELL_CHROMA.py
Traceback (most recent call last):
File "C:\aiOWN\diffuser_webui\FLUX_SCHNELL_CHROMA.py", line 10, in <module>
transformer = FluxTransformer2DModel.from_single_file(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 343, in from_single_file
diffusers_format_checkpoint = checkpoint_mapping_fn(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 2114, in convert_flux_transformer_checkpoint_to_diffusers
converted_state_dict["time_text_embed.timestep_embedder.linear_1.weight"] = checkpoint.pop(
KeyError: 'time_in.in_layer.weight'
System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.50.0.dev0
- Accelerate version: 1.4.0.dev0
- PEFT version: 0.14.1.dev0
- Bitsandbytes version: 0.45.3
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response
Activity
nitinmukesh commentedon Mar 8, 2025
Tried original weight instead of GGUF
hlky commentedon Mar 8, 2025
This won't work out-of-the-box as it's a modified Flux architecture.
References: https://github.com/lodestone-rock/ComfyUI_FluxMod https://github.com/croquelois/forgeChroma
cc @asomoza WDYT, is this something you've tried/know to be popular?
nitinmukesh commentedon Mar 8, 2025
@hlky
It's not an important one. Released few days back and based on Schnell so open source. I thought it would be simple but considering there are architectural changes, we can ignore this for now.
hlky commentedon Mar 8, 2025
@nitinmukesh Let's not close the issue.
[-]convert_flux_transformer_checkpoint_to_diffusers converted_state_dict["time_text_embed.timestep_embedder.linear_1.weight"] = checkpoint.pop( KeyError: 'time_in.in_layer.weight'[/-][+]Support Chroma - Flux based model with architecture changes[/+]asomoza commentedon Mar 11, 2025
I saw this model when it was posted but only had the time to test it now. The announcement post had quite a big reaction to it and the model in itself seems good but it's still training so it's more of a WIP.
There are some points that makes it worth to support it:
To make it work we need to take into account the pruned layers and the text encoding padding, haven't really looked at the Flux arch but this doesn't seems to be a big change right?
The results are still a mix for me, sometimes I get good results and sometimes I don't, the big difference is that I can use a normal prompt created by me instead of having to enhance it with a LLM.
prompt:
sayakpaul commentedon Mar 20, 2025
+1 on supporting smol models that are Apache 2.0
josephrocca commentedon Mar 20, 2025
As mentioned in the this pull request, this code hackily gets Chroma working in diffusers:
But note that the model is bloated back to Schnell size with the above hack because it effectively adds back all of these which @lodestone-rock distilled and removed:
Lodestone distilled them into these layers:
And they're hooked back up to their original locations like this (diagrams courtesy of lodestone himself):
Compare to original Schnell which has guidance layers within every individual transformer block (also see this image, tho I guess everyone here is familiar with the flux arch [i was very much not before attempting this]):
So, you can see that these per-transformer-block layers were pruned/distilled (diagram from Chroma HF repo):
I don't understand this stuff particularly well, just documenting all the stuff I know, in case it's helpful for implementing support in diffusers.
Aside: One advantage of the "Schnell-compat" method/hack used in the the gist I linked is that we get "free" SVDQuant/deepcompressor support, which is handy until Chroma gets officially supported.
8 remaining items