WooooDyy/BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.

/ 100

Emerging

This project helps developers fine-tune large language models (LLMs) to perform specific tasks more effectively. It takes an existing LLM and training data, applies a specialized reinforcement learning method, and outputs a more stable and high-performing LLM. Developers working on advanced AI applications would use this to improve their LLMs.

Use this if you are an AI developer looking to stabilize and enhance the performance of large language models through off-policy reinforcement learning, especially for complex generation or reasoning tasks.

Not ideal if you are an end-user simply looking to use an LLM without delving into advanced model training or if you need a solution for models other than LLMs.

LLM fine-tuning Reinforcement learning for AI AI model optimization Natural language generation Advanced AI development

No License No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 5 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights