[Feature] Add support for Lumina-T2I in diffusers #12476

Bili-Sakura · 2025-10-13T11:16:03Z

Add Lumina-T2I support with DiT-Llama architecture and LLaMA-2 text encoder

What does this PR do?

This PR adds support for Lumina-T2I, a 5B parameter text-to-image diffusion transformer model that uses LLaMA-2-7B as its text encoder. Lumina-T2I implements a rectified flow approach (velocity prediction) for efficient, high-quality image generation with support for variable resolutions.

Paper: Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Original Repository: Alpha-VLLM/Lumina-T2X

Model Weights: Alpha-VLLM/Lumina-T2I

Key Features

DiT-Llama Architecture: 5B parameter diffusion transformer with LLaMA-style design
LLaMA-2-7B Text Encoder: Uses LLaMA-2 for text encoding (different from Lumina-Next which uses Gemma)
Rectified Flow: Implements velocity-based flow matching for efficient sampling
Variable Resolution: Supports flexible resolutions from 512x512 to 2048x2048 and beyond
Adaptive Layer Normalization: Time and text conditioning via adaLN
Cross-Attention: Attention to LLaMA text embeddings
Rotary Position Embeddings: RoPE with NTK-aware scaling

What's Included

1. New Model: `LuminaDiT2DModel`

File: src/diffusers/models/transformers/transformer_lumina_dit.py

DiT-Llama architecture with 5B parameters
Adaptive layer normalization (adaLN-single) for conditioning
Cross-attention to text embeddings
Grouped Query Attention (GQA) support
Rotary position embeddings with NTK scaling
Variable resolution support via EOL tokens
Full gradient checkpointing support

2. New Scheduler: `LuminaFlowMatchScheduler`

File: src/diffusers/schedulers/scheduling_lumina_flow_match.py

Rectified flow formulation: x_t = (1-t) * noise + t * x_0
Velocity prediction: v = x_0 - noise
Time shifting support for better sampling quality
Dynamic resolution-based shifting
Efficient Euler-based integration

3. New Pipeline: `LuminaT2IPipeline`

File: src/diffusers/pipelines/lumina/pipeline_lumina_t2i.py

End-to-end text-to-image generation
Classifier-free guidance support
Negative prompt support
Variable resolution and aspect ratios
Memory-efficient CPU offloading
Batch generation support

4. Tests

File: tests/pipelines/lumina/test_lumina_t2i.py

Model instantiation tests
Scheduler functionality tests
Forward pass validation
Configuration tests

5. Documentation

Files:

docs/source/en/api/models/lumina_dit2d.md - Model API reference
docs/source/en/api/schedulers/lumina_flow_match.md - Scheduler API reference
docs/source/en/api/pipelines/lumina.md - Pipeline API reference (updated)
docs/source/en/using-diffusers/lumina_t2i.md - Comprehensive usage guide
docs/source/en/_toctree.yml - Documentation structure (updated)

Usage Example

import torch
from diffusers import LuminaT2IPipeline

# Load pipeline
pipeline = LuminaT2IPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-T2I",
    torch_dtype=torch.bfloat16
)
pipeline = pipeline.to("cuda")

# Generate image
image = pipeline(
    prompt="A majestic lion standing on a cliff at sunset",
    num_inference_steps=30,
    guidance_scale=4.0,
    height=1024,
    width=1024,
).images[0]

image.save("lion_sunset.png")

Comparison with Lumina-Next

This implementation (Lumina-T2I) is the original model from the paper, while the existing LuminaPipeline is for Lumina-Next (an improved version):

Feature	Lumina-T2I (this PR)	Lumina-Next (existing)
Text Encoder	LLaMA-2-7B	Gemma
Architecture	DiT-Llama	NextDiT
Training	From scratch	Improved/continued
Paper	Original Lumina-T2X	Lumina-Next

Both implementations are valuable:

Lumina-T2I: Original implementation, research reproducibility, LLaMA-2 encoder
Lumina-Next: Enhanced version, better speed/quality, Gemma encoder

Files Modified

Core exports:

src/diffusers/__init__.py - Added LuminaDiT2DModel, LuminaFlowMatchScheduler, LuminaT2IPipeline
src/diffusers/models/transformers/__init__.py - Added model export
src/diffusers/schedulers/__init__.py - Added scheduler export
src/diffusers/pipelines/lumina/__init__.py - Added pipeline export

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc?
Was this discussed/approved via a GitHub issue or the forum? (N/A - This is a new model addition following existing patterns)
Did you make sure to update the documentation with your changes?
- Model API reference: docs/source/en/api/models/lumina_dit2d.md
- Scheduler API reference: docs/source/en/api/schedulers/lumina_flow_match.md
- Pipeline API reference: Updated docs/source/en/api/pipelines/lumina.md
- Usage guide: docs/source/en/using-diffusers/lumina_t2i.md
- TOC: Updated docs/source/en/_toctree.yml
Did you write any new necessary tests?
- tests/pipelines/lumina/test_lumina_t2i.py

Additional Notes

No Breaking Changes: This PR is purely additive and doesn't modify any existing functionality
Code Quality: Follows diffusers conventions (ModelMixin, ConfigMixin, SchedulerMixin)
No Linting Errors: All files pass linting checks
Documentation: Comprehensive docs following diffusers style
Testing: Unit tests for model, scheduler, and pipeline components

Dependencies

Requires access to LLaMA-2-7B model (gated on Hugging Face)
PyTorch >= 2.0
Transformers >= 4.36
Standard diffusers dependencies

Who can review?

@yiyixuxu @sayakpaul - This adds a new pipeline with a DiT-based transformer and rectified flow scheduler. Would appreciate your review of the overall implementation.

@asomoza - For pipeline implementation review.

The implementation follows the existing patterns from Lumina-Next, PixArt, and other DiT-based models in diffusers.

Note: This implementation provides the original Lumina-T2I model as described in the paper, complementing the existing Lumina-Next implementation. Both models serve different use cases and having both available increases the library's coverage of state-of-the-art text-to-image models.

sayakpaul · 2025-10-13T11:20:16Z

Thanks for your PR! Could you also show us some samples of this model? Ccing @zhuole1025 as well.

Bili-Sakura · 2025-10-13T11:39:49Z

Thanks for your PR! Could you also show us some samples of this model? Ccing @zhuole1025 as well.

Yeah, it will take days. Also, I am preparing the diffusers like checkpoint.

This pr is a quick start-up. I found there is still some issues.

…updating Lumina pipeline imports. Include example documentation in LuminaT2IPipeline class.

Bili-Sakura added 2 commits October 13, 2025 18:54

Add Lumina-T2I Support to Diffusers

e8c3fd4

add Lumina T2I docs

262f4ea

sayakpaul requested a review from DN6 October 13, 2025 11:20

Enhance Lumina T2I integration by adding LuminaDiT2DModel import and …

59d478f

…updating Lumina pipeline imports. Include example documentation in LuminaT2IPipeline class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add support for Lumina-T2I in diffusers #12476

[Feature] Add support for Lumina-T2I in diffusers #12476

Bili-Sakura commented Oct 13, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Oct 13, 2025

Uh oh!

Bili-Sakura commented Oct 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Add support for Lumina-T2I in diffusers #12476

Are you sure you want to change the base?

[Feature] Add support for Lumina-T2I in diffusers #12476

Conversation

Bili-Sakura commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Lumina-T2I support with DiT-Llama architecture and LLaMA-2 text encoder

What does this PR do?

Key Features

What's Included

1. New Model: LuminaDiT2DModel

2. New Scheduler: LuminaFlowMatchScheduler

3. New Pipeline: LuminaT2IPipeline

4. Tests

5. Documentation

Usage Example

Comparison with Lumina-Next

Files Modified

Before submitting

Additional Notes

Dependencies

Who can review?

Uh oh!

sayakpaul commented Oct 13, 2025

Uh oh!

Bili-Sakura commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bili-Sakura commented Oct 13, 2025 •

edited

Loading

1. New Model: `LuminaDiT2DModel`

2. New Scheduler: `LuminaFlowMatchScheduler`

3. New Pipeline: `LuminaT2IPipeline`

Bili-Sakura commented Oct 13, 2025 •

edited

Loading