sastpg/CoVo

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

/ 100

Experimental

This project helps AI researchers and developers improve how large language models (LLMs) reason and solve complex problems, especially in areas like mathematics, commonsense, and science. It takes an existing LLM and training data as input, then applies a unique 'self-rewarding' method to enhance the model's ability to consistently arrive at correct answers. The output is a more accurate and robust LLM capable of better reasoning.

No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to train or fine-tune LLMs to achieve higher accuracy and more consistent reasoning without needing external human feedback for rewards.

Not ideal if you are looking for a plug-and-play solution for end-user applications or if you do not have the technical expertise and infrastructure for advanced LLM training and reinforcement learning.

LLM training AI research Reinforcement Learning Natural Language Processing Machine Learning Engineering

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights