Transformers documentation

TRL

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

TRL

TRL is a post-training framework for foundation models. It includes methods like SFT, GRPO, and DPO. Each method has a dedicated trainer that builds on the Trainer class and scales from a single GPU to multi-node clusters.

from datasets import load_dataset
from trl import GRPOTrainer
from trl.rewards import accuracy_reward

dataset = load_dataset("trl-lib/DeepMath-103K", split="train")

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    train_dataset=dataset,
)
trainer.train()

Transformers integration

TRL extends Transformers APIs and adds method-specific settings.

  • TRL trainers build on Trainer. Method-specific trainers like GRPOTrainer add generation, reward scoring, and loss computation. Config classes extend TrainingArguments with method-specific fields.

  • Model loading uses AutoConfig.from_pretrained(), then instantiates the model class from the config with that class’ from_pretrained.

Resources

Update on GitHub