AIFrameResearch/SPO

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

/ 100

Emerging

This project helps improve how Large Language Models (LLMs) learn to solve complex reasoning problems, especially those requiring multiple steps. It takes your LLM and training data, and outputs a more accurate and efficient LLM for tasks like math problems. Data scientists and machine learning engineers who train LLMs for reasoning tasks would find this useful.

No commits in the last 6 months.

Use this if you are training Large Language Models (LLMs) for complex, multi-step reasoning tasks and need more precise feedback during the learning process than traditional methods offer.

Not ideal if you are working with simpler LLM tasks that don't require detailed, step-by-step reasoning or if you prefer solely using token-level or trajectory-level reinforcement learning methods.

LLM training Reinforcement Learning reasoning tasks AI model optimization natural language processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 15 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights