lasgroup/SDPO

Reinforcement Learning via Self-Distillation (SDPO)

48
/ 100
Emerging

This project helps large language models (LLMs) learn and improve faster when working on tasks like writing code or solving math problems. It takes rich textual feedback (like error messages) or even just past successful attempts, and uses this information to train the model to make better future predictions. This is for AI researchers and machine learning engineers who are developing or fine-tuning LLMs for verifiable, problem-solving applications.

627 stars.

Use this if you are training large language models in environments that provide detailed feedback (like runtime errors) or if you want to reuse high-quality past outputs to accelerate learning, even with sparse feedback.

Not ideal if your application does not involve large language models or if you lack access to NVIDIA GPUs and a compatible Linux environment.

large-language-models reinforcement-learning model-fine-tuning code-generation mathematical-reasoning
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 11 / 25
Community 17 / 25

How are scores calculated?

Stars

627

Forks

57

Language

Python

License

Apache-2.0

Last pushed

Feb 18, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/lasgroup/SDPO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.