JIA-Lab-research/Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

28
/ 100
Experimental

This project helps improve how large language models (LLMs) solve complex, multi-step math problems. It takes an existing LLM and specialized math preference data, then fine-tunes the model to better break down and solve problems step-by-step. Researchers or AI developers working with LLMs in educational technology or scientific computing can use this to create more accurate and reliable reasoning models.

392 stars. No commits in the last 6 months.

Use this if you need an LLM to excel at multi-step reasoning tasks, especially in mathematics, by leveraging preference-based fine-tuning.

Not ideal if your primary goal is general conversational ability rather than detailed, step-wise problem-solving.

mathematical-reasoning large-language-models AI-model-training computational-mathematics educational-AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 10 / 25

How are scores calculated?

Stars

392

Forks

16

Language

Python

License

Last pushed

Jan 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JIA-Lab-research/Step-DPO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.