sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
This framework helps AI researchers and practitioners rapidly experiment with and develop new algorithms for aligning large language models (LLMs) to human preferences or specific behaviors online. It takes LLM responses and feedback (like preferences or verifiable rewards) to produce a refined, better-performing LLM. It's designed for those working on improving how LLMs interact and respond in real-time.
638 stars.
Use this if you are an AI researcher or machine learning engineer focused on developing or evaluating online alignment algorithms for LLMs.
Not ideal if you are a developer looking for a simple, out-of-the-box solution to fine-tune an LLM without deep involvement in algorithm research.
Stars
638
Forks
60
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 29, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/sail-sg/oat"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
hud-evals/hud-python
OSS RL environment + evals toolkit
hiyouga/EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
OpenRL-Lab/openrl
Unified Reinforcement Learning Framework
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
NVlabs/GDPO
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for...