sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

/ 100

Established

This framework helps AI researchers and practitioners rapidly experiment with and develop new algorithms for aligning large language models (LLMs) to human preferences or specific behaviors online. It takes LLM responses and feedback (like preferences or verifiable rewards) to produce a refined, better-performing LLM. It's designed for those working on improving how LLMs interact and respond in real-time.

638 stars.

Use this if you are an AI researcher or machine learning engineer focused on developing or evaluating online alignment algorithms for LLMs.

Not ideal if you are a developer looking for a simple, out-of-the-box solution to fine-tune an LLM without deep involvement in algorithm research.

LLM-research online-learning reinforcement-learning preference-learning AI-alignment

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

638

Forks

Language

Python

License

Apache-2.0

Related tools

hud-evals/hud-python

OSS RL environment + evals toolkit

hiyouga/EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

OpenRL-Lab/openrl

Unified Reinforcement Learning Framework

opendilab/awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

NVlabs/GDPO

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for...

Explore LLM Tools

All categories Trending LLM Tool directory Insights