TRL
Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.
Why it is included
Primary OSS toolkit for alignment-style post-training on top of Hugging Face Trainer.
Best for
Teams running DPO/RLHF experiments with open models and open datasets.
Strengths
- Modern alignment APIs
- Trainer integration
- Examples
Limitations
- Compute-heavy; careful evaluation still required
Good alternatives
Axolotl preference modes · OpenRLHF · Custom JAX
Related tools
AI & Machine Learning
Hugging Face Transformers
State-of-the-art pretrained models for PyTorch, TensorFlow, and JAX.
AI & Machine Learning
PEFT
Parameter-efficient fine-tuning methods (LoRA, adapters, prompt tuning) integrated with Transformers models.
AI & Machine Learning
Axolotl
YAML-configured fine-tuning for LLMs: LoRA, QLoRA, FSDP, and many architectures on top of Hugging Face trainers.
AI & Machine Learning
Hugging Face Alignment Handbook
Curated recipes and code for aligning language models (preference optimization, DPO-style flows) on open stacks.
AI & Machine Learning
Ollama
Local LLM runner and model library with simple CLI and API for workstation inference.
AI & Machine Learning
llama.cpp
Plain C/C++ inference for LLaMA-class models with broad community backends.
