Skip to content

Please support Chroma is a 8.9 billion parameter rectified flow transformer capable of generating images from text descriptions. Based on FLUX.1 [schnell] with heavy architectural modifications. #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nitinmukesh opened this issue Mar 8, 2025 · 15 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@nitinmukesh
Copy link

https://huggingface.co/lodestones/Chroma

@pcprinciple
Copy link

yeah, it's a fantastic model.
I‘m also hope the community can support this model. However, the model is still in training, so it may be a good idea to wait until the model is officially released before quantizing it.

@josephrocca
Copy link
Contributor

josephrocca commented Mar 22, 2025

it may be a good idea to wait until the model is officially released before quantizing it

Note that this issue is about supporting the model within the Nunchaku engine (and DeepCompressor), rather than the quantization itself. There's no need to wait for training to finish - Chroma is already usable. Once Nunchaku/DeepCompressor supports it every version of Chroma will be quantized by someone in the community. SVDQuant is more expensive than the average quantization method, but it's still "pennies" basically, and it's relatively easy to do once the engine supports it.

Also, RE the title of this issue "with heavy architectural modifications" - the modifications aren't too heavy. Just replacing the per-transformer-block guidance layers with a much smaller distilled version, and masking the encoder_hidden_states (leaving one unmasked padding token at the end of the prompt), and removing CLIP - see this comment for more info:

@josephrocca
Copy link
Contributor

josephrocca commented Mar 22, 2025

@lmxyy It's possible that Nunchaku and DeepCompressor already "support" Chroma, if it's possible to pass an attention_mask to NunchakuFluxTransformerBlocks. Is that possible? If it is, then Chroma can be converted into a Schnell-compatible form like this:

# docker run --rm -it --gpus all -v $(pwd):/workspace -w /workspace pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel bash

# Create Schnell-compatible variant of Chroma by downloading both Chroma and Schnell safetensor files, and copying Chroma's matching weights over to Schnell. This works because lodestone *distilled* the guidance layers instead of completely pruning them, so we can actually just use Schnell's guidance stuff. This comes at the cost of bloating the model back to Schnell's original size (which is fine for now).
CHROMA_VERSION="15"
apt-get update && apt-get install aria2 -y && pip3 install safetensors
cd /workspace
aria2c -x 16 -s 16 -o chroma.safetensors "https://huggingface.co/lodestones/Chroma/resolve/main/chroma-unlocked-v${CHROMA_VERSION}.safetensors?download=true"
aria2c -x 16 -s 16 -o flux1-schnell.safetensors "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors?download=true"
python3 -c '
from safetensors import safe_open
from safetensors.torch import save_file
with safe_open("/workspace/chroma.safetensors", framework="pt", device="cpu") as chroma, safe_open("/workspace/flux1-schnell.safetensors", framework="pt", device="cpu") as schnell:
    chroma_tensors = {key: chroma.get_tensor(key) for key in chroma.keys()}
    schnell_tensors = {key: schnell.get_tensor(key) for key in schnell.keys()}
matching_keys = set(chroma_tensors).intersection(schnell_tensors)
for key in matching_keys:
    schnell_tensors[key] = chroma_tensors[key]
save_file(schnell_tensors, "/workspace/chroma-schnell-compat.safetensors")
'

So then we only need to:

  1. Mask the padding tokens of the T5 prompt (except the last padding token), construct the attention_mask (for the joint txt+img transformer blocks only) like this
  2. Feed in torch.zeros for the CLIP embeddings (Chroma doesn't need/use the CLIP embeddings)

@lmxyy
Copy link
Collaborator

lmxyy commented Apr 27, 2025

@lmxyy It's possible that Nunchaku and DeepCompressor already "support" Chroma, if it's possible to pass an attention_mask to NunchakuFluxTransformerBlocks. Is that possible? If it is, then Chroma can be converted into a Schnell-compatible form like this:

docker run --rm -it --gpus all -v $(pwd):/workspace -w /workspace pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel bash

Create Schnell-compatible variant of Chroma by downloading both Chroma and Schnell safetensor files, and copying Chroma's matching weights over to Schnell. This works because lodestone distilled the guidance layers instead of completely pruning them, so we can actually just use Schnell's guidance stuff. This comes at the cost of bloating the model back to Schnell's original size (which is fine for now).

CHROMA_VERSION="15"
apt-get update && apt-get install aria2 -y && pip3 install safetensors
cd /workspace
aria2c -x 16 -s 16 -o chroma.safetensors "https://huggingface.co/lodestones/Chroma/resolve/main/chroma-unlocked-v${CHROMA_VERSION}.safetensors?download=true"
aria2c -x 16 -s 16 -o flux1-schnell.safetensors "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors?download=true"
python3 -c '
from safetensors import safe_open
from safetensors.torch import save_file
with safe_open("/workspace/chroma.safetensors", framework="pt", device="cpu") as chroma, safe_open("/workspace/flux1-schnell.safetensors", framework="pt", device="cpu") as schnell:
chroma_tensors = {key: chroma.get_tensor(key) for key in chroma.keys()}
schnell_tensors = {key: schnell.get_tensor(key) for key in schnell.keys()}
matching_keys = set(chroma_tensors).intersection(schnell_tensors)
for key in matching_keys:
schnell_tensors[key] = chroma_tensors[key]
save_file(schnell_tensors, "/workspace/chroma-schnell-compat.safetensors")
'
So then we only need to:

  1. Mask the padding tokens of the T5 prompt (except the last padding token), construct the attention_mask (for the joint txt+img transformer blocks only) like this
  2. Feed in torch.zeros for the CLIP embeddings (Chroma doesn't need/use the CLIP embeddings)

Should be possible

@lmxyy lmxyy reopened this Apr 27, 2025
@mit-han-lab mit-han-lab deleted a comment from github-actions bot Apr 27, 2025
@lmxyy lmxyy added enhancement New feature or request and removed inactive labels Apr 27, 2025
@josephrocca
Copy link
Contributor

josephrocca commented Apr 27, 2025

I kinda sorta got it working here in a very hacky manner:

The only weird thing is that when using the quants, I need to set CFG to 1 for the last couple of steps or it burns the image. So I'm probably doing something wrong there. (Edit: See below)

@josephrocca
Copy link
Contributor

josephrocca commented May 2, 2025

Actually, the "set CFG to 1 for the last couple of steps" hack isn't needed anymore. I think it was due to unconditional generation being unstable in earlier versions of Chroma (due to the change in masking, which the model hadn't adapted to yet). SVDQuant of Chroma v27 seems to work well in my initial testing.

Note though that the above script creates a "schnell-compatible" version of Chroma, so it's not taking advantage of smaller 8.9B size. I.e. it has a bunch of useless params in the checkpoint that are wasting flops. But it might be a starting point for official Chroma support.

@lmxyy
Copy link
Collaborator

lmxyy commented May 2, 2025

@Aprilhuu Could you take a look? This model seems quite similar to FLUX and should be straightforward to support in both DeepCompressor and Nunchaku.

@Aprilhuu
Copy link
Collaborator

Aprilhuu commented May 2, 2025

@Aprilhuu Could you take a look? This model seems quite similar to FLUX and should be straightforward to support in both DeepCompressor and Nunchaku.

Ok I will take a look

@josephrocca
Copy link
Contributor

josephrocca commented May 3, 2025

@Aprilhuu In case it's helpful, references to the Chroma code for the key changes:

And safetensors file:

https://huggingface.co/lodestones/Chroma/resolve/main/chroma-unlocked-v29.5.safetensors?download=true

And arch diagram showing distribute_modulations stuff:

Image

@sorasoras
Copy link

I am looking forward for support of chroma as well

@josephrocca
Copy link
Contributor

josephrocca commented May 14, 2025

@Aprilhuu @lmxyy just wondering if this is being actively worked on and/or will make it for the v0.3 release in late May? This issue now has the highest thumbs up count of all feature requests on the Nunchaku repo 🚀 If diffusers support is a prerequisite and is blocking this, then please let me know since I might be able to help speed that up, assuming you're waiting for an external contributor to solve that.

For those who are impatient and want to experiment, I've uploaded some unofficial, possibly low-quality/buggy weights here, based on the script I mentioned earlier. Be sure to read the usage steps since the weights are Chroma "pretending to be" Schnell:

@nitinmukesh
Copy link
Author

@josephrocca

Thank you for sharing the weights. Are these specific to Comfy or can also be used with example code (just replacing the transformer)
https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-schnell.py

@Aprilhuu
Copy link
Collaborator

Aprilhuu commented May 14, 2025

@Aprilhuu @lmxyy just wondering if this is being actively worked on and/or will make it for the v0.3 release in late May? This issue now has the highest thumbs up count of all feature requests on the Nunchaku repo 🚀 If diffusers support is a prerequisite and is blocking this, then please let me know since I might be able to help speed that up, assuming you're waiting for an external contributor to solve that.

For those who are impatient and want to experiment, I've uploaded some unofficial, possibly low-quality/buggy weights here, based on the script I mentioned earlier. Be sure to read the usage steps since the weights are Chroma "pretending to be" Schnell:

Hi Joseph. Yeah I started to look at Chroma this week. One issue is that diffusers does not officially support Chroma so I tried to work around it by using inference code in the Chroma repo but that means some extra work. So if you could help speed up the support from the Diffusers library, that would be awesome!!

@josephrocca
Copy link
Contributor

josephrocca commented May 17, 2025

@Aprilhuu There's a draft PR by @hameerabbasi taking shape here:

@Aprilhuu
Copy link
Collaborator

@Aprilhuu There's a draft PR by @hameerabbasi taking shape here:

@josephrocca I tried to hack through this draft PR but ran into a series of errors, even after switching to the new convert function. It also looks like the plan is to separate Chroma into its own class rather than keeping it as a Flux variant, so it seems like more structural changes are on the way. To avoid duplicating effort, I’ll probably wait until there’s a working PR. Lmk if you’ve had any luck getting it to work on your end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants