Skip to content

Support Chroma - Flux based model with architecture changes #11010

Closed
@nitinmukesh

Description

@nitinmukesh

Describe the bug

I am trying to use this FLUX model
https://huggingface.co/lodestones/Chroma
Chroma is a 8.9 billion parameter rectified flow transformer capable of generating images from text descriptions. Based on FLUX.1 [schnell] with heavy architectural modifications.

GGUF version is posted here
https://huggingface.co/silveroxides/Chroma-GGUF/tree/main/chroma-unlocked-v11

Reproduction

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from diffusers import GGUFQuantizationConfig

bfl_repo = "black-forest-labs/FLUX.1-schnell"
dtype = torch.bfloat16
URL = "https://huggingface.co/silveroxides/Chroma-GGUF/blob/main/"
gguf_file = "chroma-unlocked-v11/chroma-unlocked-v11-Q5_0.gguf"
transformer_path = f"{URL}{gguf_file}"
transformer = FluxTransformer2DModel.from_single_file(
	transformer_path,
	quantization_config=GGUFQuantizationConfig(compute_dtype=dtype),
	torch_dtype=dtype,
)

pipe = FluxPipeline.from_pretrained(
	bfl_repo,
	transformer=transformer,
	torch_dtype=dtype,
)
pipe.enable_model_cpu_offload()
# pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
inference_params = {
    "prompt": "A cat",
    "height": 512,
    "width": 512,
    "guidance_scale": 7.5,
    "num_inference_steps": 20,
    "generator": torch.Generator(device="cpu").manual_seed(0),
}
image = pipe(**inference_params).images[0]
image.save("chroma.png")

Logs

(venv) C:\aiOWN\diffuser_webui>python FLUX_SCHNELL_CHROMA.py
Traceback (most recent call last):
  File "C:\aiOWN\diffuser_webui\FLUX_SCHNELL_CHROMA.py", line 10, in <module>
    transformer = FluxTransformer2DModel.from_single_file(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 343, in from_single_file
    diffusers_format_checkpoint = checkpoint_mapping_fn(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 2114, in convert_flux_transformer_checkpoint_to_diffusers
    converted_state_dict["time_text_embed.timestep_embedder.linear_1.weight"] = checkpoint.pop(
KeyError: 'time_in.in_layer.weight'

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Windows-10-10.0.26100-SP0
  • Running on Google Colab?: No
  • Python version: 3.10.11
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.27.1
  • Transformers version: 4.50.0.dev0
  • Accelerate version: 1.4.0.dev0
  • PEFT version: 0.14.1.dev0
  • Bitsandbytes version: 0.45.3
  • Safetensors version: 0.5.2
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Activity

nitinmukesh

nitinmukesh commented on Mar 8, 2025

@nitinmukesh
Author

Tried original weight instead of GGUF

transformer_path = "https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v11.safetensors"
transformer = FluxTransformer2DModel.from_single_file(
	transformer_path,
	torch_dtype=dtype,
)
pipe = FluxPipeline.from_pretrained(
	bfl_repo,
	transformer=transformer,
	torch_dtype=dtype,
)
(venv) C:\aiOWN\diffuser_webui>python FLUX_SCHNELL_CHROMA.py
Traceback (most recent call last):
  File "C:\aiOWN\diffuser_webui\FLUX_SCHNELL_CHROMA.py", line 18, in <module>
    transformer = FluxTransformer2DModel.from_single_file(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 343, in from_single_file
    diffusers_format_checkpoint = checkpoint_mapping_fn(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 2114, in convert_flux_transformer_checkpoint_to_diffusers
    converted_state_dict["time_text_embed.timestep_embedder.linear_1.weight"] = checkpoint.pop(
KeyError: 'time_in.in_layer.weight'
hlky

hlky commented on Mar 8, 2025

@hlky
Contributor

Based on FLUX.1 [schnell] with heavy architectural modifications.

This won't work out-of-the-box as it's a modified Flux architecture.

References: https://github.com/lodestone-rock/ComfyUI_FluxMod https://github.com/croquelois/forgeChroma

cc @asomoza WDYT, is this something you've tried/know to be popular?

nitinmukesh

nitinmukesh commented on Mar 8, 2025

@nitinmukesh
Author

@hlky

It's not an important one. Released few days back and based on Schnell so open source. I thought it would be simple but considering there are architectural changes, we can ignore this for now.

hlky

hlky commented on Mar 8, 2025

@hlky
Contributor

@nitinmukesh Let's not close the issue.

reopened this on Mar 8, 2025
changed the title [-]convert_flux_transformer_checkpoint_to_diffusers converted_state_dict["time_text_embed.timestep_embedder.linear_1.weight"] = checkpoint.pop( KeyError: 'time_in.in_layer.weight'[/-] [+]Support Chroma - Flux based model with architecture changes[/+] on Mar 8, 2025
added and removed
bugSomething isn't working
on Mar 8, 2025
asomoza

asomoza commented on Mar 11, 2025

@asomoza
Member

I saw this model when it was posted but only had the time to test it now. The announcement post had quite a big reaction to it and the model in itself seems good but it's still training so it's more of a WIP.

There are some points that makes it worth to support it:

  • It has the apache license of schnell so you can do whatever you want with it
  • It has fewer parameters which makes it more consumer GPU friendly (supposedly at a minimal quality loss).
  • it only uses one text encoder (T5)
  • it has the T5 prompt masking fix which in theory makes it able to understand better shorter prompts and you don't need a bible like the normal models.

To make it work we need to take into account the pruned layers and the text encoding padding, haven't really looked at the Flux arch but this doesn't seems to be a big change right?

The results are still a mix for me, sometimes I get good results and sometimes I don't, the big difference is that I can use a normal prompt created by me instead of having to enhance it with a LLM.

prompt:

a beagle dog plushie sitting on a bed while looking at the camera with sad eyes, the background its a cozy bedroom with dim lighting coming from the windows which have curtains on them.

Image

sayakpaul

sayakpaul commented on Mar 20, 2025

@sayakpaul
Member

+1 on supporting smol models that are Apache 2.0

josephrocca

josephrocca commented on Mar 20, 2025

@josephrocca
Contributor

As mentioned in the this pull request, this code hackily gets Chroma working in diffusers:

But note that the model is bloated back to Schnell size with the above hack because it effectively adds back all of these which @lodestone-rock distilled and removed:

  double_blocks.0.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.0.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.0.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.0.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.1.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.1.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.1.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.1.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.10.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.10.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.10.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.10.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.11.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.11.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.11.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.11.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.12.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.12.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.12.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.12.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.13.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.13.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.13.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.13.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.14.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.14.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.14.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.14.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.15.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.15.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.15.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.15.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.16.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.16.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.16.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.16.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.17.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.17.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.17.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.17.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.18.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.18.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.18.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.18.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.2.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.2.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.2.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.2.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.3.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.3.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.3.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.3.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.4.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.4.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.4.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.4.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.5.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.5.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.5.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.5.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.6.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.6.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.6.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.6.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.7.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.7.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.7.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.7.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.8.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.8.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.8.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.8.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.9.img_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.9.img_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  double_blocks.9.txt_mod.lin.bias: [18432] bfloat16 36.00 KB
  double_blocks.9.txt_mod.lin.weight: [18432, 3072] bfloat16 108.00 MB
  final_layer.adaLN_modulation.1.bias: [6144] bfloat16 12.00 KB
  final_layer.adaLN_modulation.1.weight: [6144, 3072] bfloat16 36.00 MB
  single_blocks.0.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.0.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.1.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.1.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.10.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.10.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.11.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.11.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.12.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.12.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.13.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.13.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.14.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.14.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.15.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.15.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.16.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.16.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.17.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.17.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.18.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.18.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.19.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.19.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.2.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.2.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.20.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.20.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.21.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.21.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.22.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.22.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.23.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.23.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.24.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.24.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.25.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.25.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.26.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.26.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.27.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.27.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.28.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.28.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.29.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.29.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.3.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.3.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.30.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.30.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.31.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.31.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.32.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.32.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.33.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.33.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.34.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.34.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.35.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.35.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.36.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.36.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.37.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.37.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.4.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.4.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.5.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.5.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.6.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.6.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.7.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.7.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.8.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.8.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB
  single_blocks.9.modulation.lin.bias: [9216] bfloat16 18.00 KB
  single_blocks.9.modulation.lin.weight: [9216, 3072] bfloat16 54.00 MB

Lodestone distilled them into these layers:

  distilled_guidance_layer.in_proj.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.in_proj.weight: [5120, 64] bfloat16 640.00 KB
  distilled_guidance_layer.layers.0.in_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.0.in_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.0.out_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.0.out_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.1.in_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.1.in_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.1.out_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.1.out_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.2.in_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.2.in_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.2.out_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.2.out_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.3.in_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.3.in_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.3.out_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.3.out_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.4.in_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.4.in_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.layers.4.out_layer.bias: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.layers.4.out_layer.weight: [5120, 5120] bfloat16 50.00 MB
  distilled_guidance_layer.norms.0.scale: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.norms.1.scale: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.norms.2.scale: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.norms.3.scale: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.norms.4.scale: [5120] bfloat16 10.00 KB
  distilled_guidance_layer.out_proj.bias: [3072] bfloat16 6.00 KB
  distilled_guidance_layer.out_proj.weight: [3072, 5120] bfloat16 30.00 MB

And they're hooked back up to their original locations like this (diagrams courtesy of lodestone himself):

Image

Compare to original Schnell which has guidance layers within every individual transformer block (also see this image, tho I guess everyone here is familiar with the flux arch [i was very much not before attempting this]):

Image

So, you can see that these per-transformer-block layers were pruned/distilled (diagram from Chroma HF repo):

Image

I don't understand this stuff particularly well, just documenting all the stuff I know, in case it's helpful for implementing support in diffusers.

Aside: One advantage of the "Schnell-compat" method/hack used in the the gist I linked is that we get "free" SVDQuant/deepcompressor support, which is handy until Chroma gets officially supported.

8 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Support Chroma - Flux based model with architecture changes · Issue #11010 · huggingface/diffusers