TRL

Name: TRL
Availability: InStock

Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.

Why it is included

Primary OSS toolkit for alignment-style post-training on top of Hugging Face Trainer.

Teams running DPO/RLHF experiments with open models and open datasets.

Axolotl preference modes · OpenRLHF · Custom JAX