Skip to content

Conversation

Bili-Sakura
Copy link

@Bili-Sakura Bili-Sakura commented Oct 13, 2025

Add Lumina-T2I support with DiT-Llama architecture and LLaMA-2 text encoder

What does this PR do?

This PR adds support for Lumina-T2I, a 5B parameter text-to-image diffusion transformer model that uses LLaMA-2-7B as its text encoder. Lumina-T2I implements a rectified flow approach (velocity prediction) for efficient, high-quality image generation with support for variable resolutions.

Paper: Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Original Repository: Alpha-VLLM/Lumina-T2X

Model Weights: Alpha-VLLM/Lumina-T2I

Key Features

  • DiT-Llama Architecture: 5B parameter diffusion transformer with LLaMA-style design
  • LLaMA-2-7B Text Encoder: Uses LLaMA-2 for text encoding (different from Lumina-Next which uses Gemma)
  • Rectified Flow: Implements velocity-based flow matching for efficient sampling
  • Variable Resolution: Supports flexible resolutions from 512x512 to 2048x2048 and beyond
  • Adaptive Layer Normalization: Time and text conditioning via adaLN
  • Cross-Attention: Attention to LLaMA text embeddings
  • Rotary Position Embeddings: RoPE with NTK-aware scaling

What's Included

1. New Model: LuminaDiT2DModel

File: src/diffusers/models/transformers/transformer_lumina_dit.py

  • DiT-Llama architecture with 5B parameters
  • Adaptive layer normalization (adaLN-single) for conditioning
  • Cross-attention to text embeddings
  • Grouped Query Attention (GQA) support
  • Rotary position embeddings with NTK scaling
  • Variable resolution support via EOL tokens
  • Full gradient checkpointing support

2. New Scheduler: LuminaFlowMatchScheduler

File: src/diffusers/schedulers/scheduling_lumina_flow_match.py

  • Rectified flow formulation: x_t = (1-t) * noise + t * x_0
  • Velocity prediction: v = x_0 - noise
  • Time shifting support for better sampling quality
  • Dynamic resolution-based shifting
  • Efficient Euler-based integration

3. New Pipeline: LuminaT2IPipeline

File: src/diffusers/pipelines/lumina/pipeline_lumina_t2i.py

  • End-to-end text-to-image generation
  • Classifier-free guidance support
  • Negative prompt support
  • Variable resolution and aspect ratios
  • Memory-efficient CPU offloading
  • Batch generation support

4. Tests

File: tests/pipelines/lumina/test_lumina_t2i.py

  • Model instantiation tests
  • Scheduler functionality tests
  • Forward pass validation
  • Configuration tests

5. Documentation

Files:

  • docs/source/en/api/models/lumina_dit2d.md - Model API reference
  • docs/source/en/api/schedulers/lumina_flow_match.md - Scheduler API reference
  • docs/source/en/api/pipelines/lumina.md - Pipeline API reference (updated)
  • docs/source/en/using-diffusers/lumina_t2i.md - Comprehensive usage guide
  • docs/source/en/_toctree.yml - Documentation structure (updated)

Usage Example

import torch
from diffusers import LuminaT2IPipeline

# Load pipeline
pipeline = LuminaT2IPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-T2I",
    torch_dtype=torch.bfloat16
)
pipeline = pipeline.to("cuda")

# Generate image
image = pipeline(
    prompt="A majestic lion standing on a cliff at sunset",
    num_inference_steps=30,
    guidance_scale=4.0,
    height=1024,
    width=1024,
).images[0]

image.save("lion_sunset.png")

Comparison with Lumina-Next

This implementation (Lumina-T2I) is the original model from the paper, while the existing LuminaPipeline is for Lumina-Next (an improved version):

Feature Lumina-T2I (this PR) Lumina-Next (existing)
Text Encoder LLaMA-2-7B Gemma
Architecture DiT-Llama NextDiT
Training From scratch Improved/continued
Paper Original Lumina-T2X Lumina-Next

Both implementations are valuable:

  • Lumina-T2I: Original implementation, research reproducibility, LLaMA-2 encoder
  • Lumina-Next: Enhanced version, better speed/quality, Gemma encoder

Files Modified

Core exports:

  • src/diffusers/__init__.py - Added LuminaDiT2DModel, LuminaFlowMatchScheduler, LuminaT2IPipeline
  • src/diffusers/models/transformers/__init__.py - Added model export
  • src/diffusers/schedulers/__init__.py - Added scheduler export
  • src/diffusers/pipelines/lumina/__init__.py - Added pipeline export

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc?
  • Was this discussed/approved via a GitHub issue or the forum? (N/A - This is a new model addition following existing patterns)
  • Did you make sure to update the documentation with your changes?
    - Model API reference: docs/source/en/api/models/lumina_dit2d.md
    - Scheduler API reference: docs/source/en/api/schedulers/lumina_flow_match.md
    - Pipeline API reference: Updated docs/source/en/api/pipelines/lumina.md
    - Usage guide: docs/source/en/using-diffusers/lumina_t2i.md
    - TOC: Updated docs/source/en/_toctree.yml
  • Did you write any new necessary tests?
    - tests/pipelines/lumina/test_lumina_t2i.py

Additional Notes

  1. No Breaking Changes: This PR is purely additive and doesn't modify any existing functionality
  2. Code Quality: Follows diffusers conventions (ModelMixin, ConfigMixin, SchedulerMixin)
  3. No Linting Errors: All files pass linting checks
  4. Documentation: Comprehensive docs following diffusers style
  5. Testing: Unit tests for model, scheduler, and pipeline components

Dependencies

  • Requires access to LLaMA-2-7B model (gated on Hugging Face)
  • PyTorch >= 2.0
  • Transformers >= 4.36
  • Standard diffusers dependencies

Who can review?

@yiyixuxu @sayakpaul - This adds a new pipeline with a DiT-based transformer and rectified flow scheduler. Would appreciate your review of the overall implementation.

@asomoza - For pipeline implementation review.

The implementation follows the existing patterns from Lumina-Next, PixArt, and other DiT-based models in diffusers.


Note: This implementation provides the original Lumina-T2I model as described in the paper, complementing the existing Lumina-Next implementation. Both models serve different use cases and having both available increases the library's coverage of state-of-the-art text-to-image models.

@sayakpaul
Copy link
Member

Thanks for your PR! Could you also show us some samples of this model? Ccing @zhuole1025 as well.

@sayakpaul sayakpaul requested a review from DN6 October 13, 2025 11:20
@Bili-Sakura
Copy link
Author

Bili-Sakura commented Oct 13, 2025

Thanks for your PR! Could you also show us some samples of this model? Ccing @zhuole1025 as well.

Yeah, it will take days. Also, I am preparing the diffusers like checkpoint.

This pr is a quick start-up. I found there is still some issues.

…updating Lumina pipeline imports. Include example documentation in LuminaT2IPipeline class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants