PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
This project helps large language models (LLMs) improve their reasoning abilities, even when you don't have the correct answers (ground-truth labels) for your test data. It takes an existing LLM and your unlabeled test questions, then uses a clever technique to estimate rewards and refine the model's responses. The output is a significantly more accurate LLM for challenging reasoning tasks, useful for AI researchers and practitioners building advanced language applications.
1,014 stars.
Use this if you need to boost the performance of your large language models on complex reasoning problems using only unlabeled test data.
Not ideal if you have ground-truth labels readily available for your test data, as traditional supervised methods might be more straightforward.
Stars
1,014
Forks
77
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PRIME-RL/TTRL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
HKUDS/LightReasoner
"LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"