Also strong
Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.
rlhfdpoalignmentllm
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
2 tools match your filters
Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.
Curated recipes and code for aligning language models (preference optimization, DPO-style flows) on open stacks.