PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

52
/ 100
Established

This project helps large language models (LLMs) improve their reasoning abilities, even when you don't have the correct answers (ground-truth labels) for your test data. It takes an existing LLM and your unlabeled test questions, then uses a clever technique to estimate rewards and refine the model's responses. The output is a significantly more accurate LLM for challenging reasoning tasks, useful for AI researchers and practitioners building advanced language applications.

1,014 stars.

Use this if you need to boost the performance of your large language models on complex reasoning problems using only unlabeled test data.

Not ideal if you have ground-truth labels readily available for your test data, as traditional supervised methods might be more straightforward.

Large Language Models Reinforcement Learning AI Reasoning Model Fine-tuning Unsupervised Learning
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 17 / 25

How are scores calculated?

Stars

1,014

Forks

77

Language

Python

License

MIT

Last pushed

Mar 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PRIME-RL/TTRL"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.