sastpg/CoVo

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

15
/ 100
Experimental

This project helps AI researchers and developers improve how large language models (LLMs) reason and solve complex problems, especially in areas like mathematics, commonsense, and science. It takes an existing LLM and training data as input, then applies a unique 'self-rewarding' method to enhance the model's ability to consistently arrive at correct answers. The output is a more accurate and robust LLM capable of better reasoning.

No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to train or fine-tune LLMs to achieve higher accuracy and more consistent reasoning without needing external human feedback for rewards.

Not ideal if you are looking for a plug-and-play solution for end-user applications or if you do not have the technical expertise and infrastructure for advanced LLM training and reinforcement learning.

LLM training AI research Reinforcement Learning Natural Language Processing Machine Learning Engineering
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 7 / 25
Community 0 / 25

How are scores calculated?

Stars

22

Forks

Language

Python

License

Last pushed

Jun 25, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sastpg/CoVo"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.