lasgroup/SDPO
Reinforcement Learning via Self-Distillation (SDPO)
This project helps large language models (LLMs) learn and improve faster when working on tasks like writing code or solving math problems. It takes rich textual feedback (like error messages) or even just past successful attempts, and uses this information to train the model to make better future predictions. This is for AI researchers and machine learning engineers who are developing or fine-tuning LLMs for verifiable, problem-solving applications.
627 stars.
Use this if you are training large language models in environments that provide detailed feedback (like runtime errors) or if you want to reuse high-quality past outputs to accelerate learning, even with sparse feedback.
Not ideal if your application does not involve large language models or if you lack access to NVIDIA GPUs and a compatible Linux environment.
Stars
627
Forks
57
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/lasgroup/SDPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
machinelearningnuremberg/DPL
[NeurIPS 2023] Multi-fidelity hyperparameter optimization with deep power laws that achieves...
HUST-AI-HYZ/FARMS
Open source code for ICML 2025 Paper: Eigenspectrum Analysis of Neural Networks without Aspect...
gabrielSantosLima/vlm_garbage_classification
:star: Comparing VLMs with CNNs for garbage classification