sastpg/CoVo
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
This project helps AI researchers and developers improve how large language models (LLMs) reason and solve complex problems, especially in areas like mathematics, commonsense, and science. It takes an existing LLM and training data as input, then applies a unique 'self-rewarding' method to enhance the model's ability to consistently arrive at correct answers. The output is a more accurate and robust LLM capable of better reasoning.
No commits in the last 6 months.
Use this if you are an AI researcher or developer looking to train or fine-tune LLMs to achieve higher accuracy and more consistent reasoning without needing external human feedback for rewards.
Not ideal if you are looking for a plug-and-play solution for end-user applications or if you do not have the technical expertise and infrastructure for advanced LLM training and reinforcement learning.
Stars
22
Forks
—
Language
Python
License
—
Category
Last pushed
Jun 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sastpg/CoVo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models