PRIME-RL/Entropy-Mechanism-of-RL

The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.

/ 100

Experimental

This project helps large language models (LLMs) maintain their ability to explore diverse solutions and avoid getting stuck on a single, overconfident answer when performing complex reasoning tasks. It takes a pre-trained LLM and, through a specialized training process using reinforcement learning, helps it generate more varied and accurate responses, especially for challenging problems like advanced math. The primary users are researchers and practitioners working to improve the reasoning capabilities of LLMs for specialized applications.

421 stars. No commits in the last 6 months.

Use this if you are an AI researcher or LLM developer experiencing 'entropy collapse' in your reinforcement learning training pipelines, where your LLM becomes too narrow in its reasoning and its performance plateaus.

Not ideal if you are a casual user looking for a pre-built, ready-to-deploy LLM for general tasks, or if you don't have a strong understanding of reinforcement learning and LLM fine-tuning.

Large Language Models Reinforcement Learning AI Research Reasoning Model Fine-tuning

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 7 / 25

Community 10 / 25

How are scores calculated?

Stars

421

Forks

Language

Python

License

—

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights