PRIME-RL/Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
This project helps large language models (LLMs) maintain their ability to explore diverse solutions and avoid getting stuck on a single, overconfident answer when performing complex reasoning tasks. It takes a pre-trained LLM and, through a specialized training process using reinforcement learning, helps it generate more varied and accurate responses, especially for challenging problems like advanced math. The primary users are researchers and practitioners working to improve the reasoning capabilities of LLMs for specialized applications.
421 stars. No commits in the last 6 months.
Use this if you are an AI researcher or LLM developer experiencing 'entropy collapse' in your reinforcement learning training pipelines, where your LLM becomes too narrow in its reasoning and its performance plateaus.
Not ideal if you are a casual user looking for a pre-built, ready-to-deploy LLM for general tasks, or if you don't have a strong understanding of reinforcement learning and LLM fine-tuning.
Stars
421
Forks
15
Language
Python
License
—
Category
Last pushed
Jul 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PRIME-RL/Entropy-Mechanism-of-RL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models