Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

TRL

Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.

Why it is included

Primary OSS toolkit for alignment-style post-training on top of Hugging Face Trainer.

Best for

Teams running DPO/RLHF experiments with open models and open datasets.

Strengths

  • Modern alignment APIs
  • Trainer integration
  • Examples

Limitations

  • Compute-heavy; careful evaluation still required

Good alternatives

Axolotl preference modes · OpenRLHF · Custom JAX

Related tools