PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

/ 100

Established

This project helps large language models (LLMs) improve their reasoning abilities, even when you don't have the correct answers (ground-truth labels) for your test data. It takes an existing LLM and your unlabeled test questions, then uses a clever technique to estimate rewards and refine the model's responses. The output is a significantly more accurate LLM for challenging reasoning tasks, useful for AI researchers and practitioners building advanced language applications.

1,014 stars.

Use this if you need to boost the performance of your large language models on complex reasoning problems using only unlabeled test data.

Not ideal if you have ground-truth labels readily available for your test data, as traditional supervised methods might be more straightforward.

Large Language Models Reinforcement Learning AI Reasoning Model Fine-tuning Unsupervised Learning

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 17 / 25

How are scores calculated?

Stars

1,014

Forks

Language

Python

License

MIT

Related models

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

HKUDS/LightReasoner

"LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"

Explore Transformer Models

All categories Trending Transformer directory Insights