Skip to content

v0.17.0

Compare
Choose a tag to compare
@qgallouedec qgallouedec released this 25 Apr 02:20
· 62 commits to main since this release
cd6b3de

Major and breaking

The TRL v0.17 release introduces three major changes that, together, enable significantly faster generation performance in GRPO—up to 10x faster in some configurations.

autonlp-08

These three changes are:

  • Data parallelism (DP) for the vLLM server
  • A new GRPO training strategy that generates once per effective batch
  • Support for the V1 engine in vLLM

Below, we provide a summary of these changes and how to use them.

⚡ Up to 4x faster: Data Parallel for vLLM server

The TRL vLLM server now supports data parallelism (DP), enabling significantly faster generation speeds—especially for smaller models. This new feature can be used by adding the --data_parallel_size N argument when launching the vLLM server.

trl vllm-serve --model Qwen/Qwen2.5-14B-Instruct --tensor_parallel_size 2 --data_parallel_size 2

by @qgallouedec in #3310

* ☝️ [GRPO] Generate once per effective batch

Previously, GRPO made one generation request per global batch. The global batch is the total of all local batches, without accounting for gradient accumulation. In other words, if the gradient accumulation step was 8, GRPO would make 8 generation requests per training step.

Now, GRPO groups these global batches into a single "effective batch" and makes only one generation request per effective batch. Since vLLM applies optimizations that are especially effective for large batches, this new approach leads to significantly faster training overall.

No changes are required in the training script, as this is handled internally by the GRPO trainer.

Untitled-2025-04-08-0623-2

by @qgallouedec in #3283

⏱️ Fix vLLM server to support V1 Engine

vLLM provides two versions of its engine (V0 and V1), and V1 is significantly faster. This version is now supported by TRL and requires vLLM version 0.8.3 or higher.

by @I-l-l-I in #3276

👎 [GRPO] Adds option to disable dropout

Disabling dropout has shown to stabilize training. You can now disable dropout in GRPO by setting the disable_dropout argument to False in the GRPO config.

from trl import GRPOConfig

training_args = GRPOConfig(..., disable_dropout=True)

by @edbeeching in #3234

🩺 Dr. GRPO loss

GRPO now supports the various losses proposed in the recent literature, including the Dr. GRPO loss. The loss type can be set in the GRPO config:

from trl import GRPOConfig

training_args = GRPOConfig(..., loss_type="dr_grpo")

by @qgallouedec in #3256

🎲 [GRPO] Make training dataset shuffle optional

The GRPO trainer now has an option to disable shuffling of the training dataset. This is useful for curriculum learning, where the order of the training data is important.

from trl import GRPOConfig

training_args = GRPOConfig(..., shuffle_dataset=False)

by @LeonEricsson in #3334

☕ Overlong-filtering for GRPO

Overlong filtering has been shown to significantly stabilize learning and improve performance. You can now use it in TRL!

It simply consists in masking the loss of truncated samples

from trl import GRPOConfig

training_args = GRPOConfig(..., mask_truncated_completions=True)

Untitled-2025-04-08-0623

by @shirinyamani in #3248

🐯 Integrate Liger GRPO Loss to GRPO Trainer

Liger allows to significantly reduce the memory peak of the loss computation. You can now use it in TRL with the use_liger_loss argument in the GRPO config:

from trl import GRPOConfig

training_args = GRPOConfig(..., use_liger_loss=True)

by @shivam15s in #3184

Bug fixes

What's Changed

New Contributors

Full Changelog: v0.16.0...v0.17.0