AIFrameResearch/SPO
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
This project helps improve how Large Language Models (LLMs) learn to solve complex reasoning problems, especially those requiring multiple steps. It takes your LLM and training data, and outputs a more accurate and efficient LLM for tasks like math problems. Data scientists and machine learning engineers who train LLMs for reasoning tasks would find this useful.
No commits in the last 6 months.
Use this if you are training Large Language Models (LLMs) for complex, multi-step reasoning tasks and need more precise feedback during the learning process than traditional methods offer.
Not ideal if you are working with simpler LLM tasks that don't require detailed, step-by-step reasoning or if you prefer solely using token-level or trajectory-level reinforcement learning methods.
Stars
45
Forks
5
Language
Python
License
MIT
Category
Last pushed
Sep 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AIFrameResearch/SPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.