AIFrameResearch/SPO

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

36
/ 100
Emerging

This project helps improve how Large Language Models (LLMs) learn to solve complex reasoning problems, especially those requiring multiple steps. It takes your LLM and training data, and outputs a more accurate and efficient LLM for tasks like math problems. Data scientists and machine learning engineers who train LLMs for reasoning tasks would find this useful.

No commits in the last 6 months.

Use this if you are training Large Language Models (LLMs) for complex, multi-step reasoning tasks and need more precise feedback during the learning process than traditional methods offer.

Not ideal if you are working with simpler LLM tasks that don't require detailed, step-by-step reasoning or if you prefer solely using token-level or trajectory-level reinforcement learning methods.

LLM training Reinforcement Learning reasoning tasks AI model optimization natural language processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 11 / 25

How are scores calculated?

Stars

45

Forks

5

Language

Python

License

MIT

Last pushed

Sep 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AIFrameResearch/SPO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.