[Feature] Add support for Lumina-T2I in diffusers #12476
+2,862
−696
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Lumina-T2I support with DiT-Llama architecture and LLaMA-2 text encoder
What does this PR do?
This PR adds support for Lumina-T2I, a 5B parameter text-to-image diffusion transformer model that uses LLaMA-2-7B as its text encoder. Lumina-T2I implements a rectified flow approach (velocity prediction) for efficient, high-quality image generation with support for variable resolutions.
Paper: Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Original Repository: Alpha-VLLM/Lumina-T2X
Model Weights: Alpha-VLLM/Lumina-T2I
Key Features
What's Included
1. New Model:
LuminaDiT2DModel
File:
src/diffusers/models/transformers/transformer_lumina_dit.py
2. New Scheduler:
LuminaFlowMatchScheduler
File:
src/diffusers/schedulers/scheduling_lumina_flow_match.py
x_t = (1-t) * noise + t * x_0
v = x_0 - noise
3. New Pipeline:
LuminaT2IPipeline
File:
src/diffusers/pipelines/lumina/pipeline_lumina_t2i.py
4. Tests
File:
tests/pipelines/lumina/test_lumina_t2i.py
5. Documentation
Files:
docs/source/en/api/models/lumina_dit2d.md
- Model API referencedocs/source/en/api/schedulers/lumina_flow_match.md
- Scheduler API referencedocs/source/en/api/pipelines/lumina.md
- Pipeline API reference (updated)docs/source/en/using-diffusers/lumina_t2i.md
- Comprehensive usage guidedocs/source/en/_toctree.yml
- Documentation structure (updated)Usage Example
Comparison with Lumina-Next
This implementation (Lumina-T2I) is the original model from the paper, while the existing
LuminaPipeline
is for Lumina-Next (an improved version):Both implementations are valuable:
Files Modified
Core exports:
src/diffusers/__init__.py
- AddedLuminaDiT2DModel
,LuminaFlowMatchScheduler
,LuminaT2IPipeline
src/diffusers/models/transformers/__init__.py
- Added model exportsrc/diffusers/schedulers/__init__.py
- Added scheduler exportsrc/diffusers/pipelines/lumina/__init__.py
- Added pipeline exportBefore submitting
- Model API reference:
docs/source/en/api/models/lumina_dit2d.md
- Scheduler API reference:
docs/source/en/api/schedulers/lumina_flow_match.md
- Pipeline API reference: Updated
docs/source/en/api/pipelines/lumina.md
- Usage guide:
docs/source/en/using-diffusers/lumina_t2i.md
- TOC: Updated
docs/source/en/_toctree.yml
-
tests/pipelines/lumina/test_lumina_t2i.py
Additional Notes
Dependencies
Who can review?
@yiyixuxu @sayakpaul - This adds a new pipeline with a DiT-based transformer and rectified flow scheduler. Would appreciate your review of the overall implementation.
@asomoza - For pipeline implementation review.
The implementation follows the existing patterns from Lumina-Next, PixArt, and other DiT-based models in diffusers.
Note: This implementation provides the original Lumina-T2I model as described in the paper, complementing the existing Lumina-Next implementation. Both models serve different use cases and having both available increases the library's coverage of state-of-the-art text-to-image models.