lasgroup/SDPO

Reinforcement Learning via Self-Distillation (SDPO)

/ 100

Emerging

This project helps large language models (LLMs) learn and improve faster when working on tasks like writing code or solving math problems. It takes rich textual feedback (like error messages) or even just past successful attempts, and uses this information to train the model to make better future predictions. This is for AI researchers and machine learning engineers who are developing or fine-tuning LLMs for verifiable, problem-solving applications.

627 stars.

Use this if you are training large language models in environments that provide detailed feedback (like runtime errors) or if you want to reuse high-quality past outputs to accelerate learning, even with sparse feedback.

Not ideal if your application does not involve large language models or if you lack access to NVIDIA GPUs and a compatible Linux environment.

large-language-models reinforcement-learning model-fine-tuning code-generation mathematical-reasoning

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 17 / 25

How are scores calculated?

Stars

627

Forks

Language

Python

License

Apache-2.0

Related frameworks

machinelearningnuremberg/DPL

[NeurIPS 2023] Multi-fidelity hyperparameter optimization with deep power laws that achieves...

HUST-AI-HYZ/FARMS

Open source code for ICML 2025 Paper: Eigenspectrum Analysis of Neural Networks without Aspect...

gabrielSantosLima/vlm_garbage_classification

:star: Comparing VLMs with CNNs for garbage classification

Explore ML Frameworks

All categories Trending ML Framework directory Insights