New Adapter/Pipeline Request: IT-Blender for Creative Conceptual Blending

## Model/Pipeline/Scheduler description

### Name of the model/pipeline/scheduler
"Image-and-Text Concept Blender" (IT-Blender), a diffusion adapter that blends visual concepts from a real reference image with textual concepts from a prompt in a disentangled manner. The goal is to enhance human creativity in design tasks.

### Project page & ArXiv link
Paper link: https://arxiv.org/pdf/2506.24085
The project website: https://imagineforme.github.io/ 
**(a lot of interesting feasible examples are in the project page.)**
</br>

<img width="2880" height="3159" alt="Image" src="https://github.com/user-attachments/assets/87607797-32a1-41a5-b5aa-69cd8406352c" />

### What is the proposed method?

IT-Blender is an adapter that works with existing models like SD and FLUX. Its core innovation is the **Blended Attention (BA)** module. This module modifies the standard self-attention layers. It uses a two-stream approach (a noisy stream for generation and a clean reference stream for the image) and introduces trainable parameters within an Image Cross-Attention (imCA) term to bridge the distributional shift between clean and noisy latents.

### Is the pipeline different from an existing pipeline?
Yes. The IT-Blender pipeline is distinct for a few reasons:
1.  **Native Image Encoding**: It uses the diffusion model's own denoising network to encode the reference image by forwarding a clean version at "t=0". This avoids an external image encoder to better preserve details.
2.  **Two-Stream Processing**: During training and inference, it processes a "noisy stream" for the text-guided generation and a "reference stream" for the clean visual concept image simultaneously.
3.  **Blended Attention Integration**: The pipeline replaces standard self-attention modules with the new Blended Attention (BA) module, which is designed to physically separate textual and visual concept processing.

### Why is this method useful?
The method is particularly effective for creative tasks like product design, character design, and graphic design, as shown by the extensive examples in the paper and project page. We believe it would be a valuable and unique addition to the `diffusers` library.

### Open source status

- [x] The model implementation is available.
- [x] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

**Demo page**: https://huggingface.co/spaces/WonwoongCho/IT-Blender
**GitHub page for inference**: https://github.com/WonwoongCho/IT-Blender
Note that we are using our own diffusers with a little bit of changes (`requirements.txt` in the github repo);

**Changed Diffusers Pipeline for FLUX**: https://github.com/WonwoongCho/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux.py
**Changed Diffusers Pipeline for SD1.5**: https://github.com/WonwoongCho/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Adapter/Pipeline Request: IT-Blender for Creative Conceptual Blending #11961

Model/Pipeline/Scheduler description

Name of the model/pipeline/scheduler

Project page & ArXiv link

What is the proposed method?

Is the pipeline different from an existing pipeline?

Why is this method useful?

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Adapter/Pipeline Request: IT-Blender for Creative Conceptual Blending #11961

Description

Model/Pipeline/Scheduler description

Name of the model/pipeline/scheduler

Project page & ArXiv link

What is the proposed method?

Is the pipeline different from an existing pipeline?

Why is this method useful?

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions