bobxwu/learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.
This is a curated collection of research papers focused on how Large Language Models (LLMs) learn from rewards during and after their initial training. It organizes various methods for using 'reward models' and different learning strategies to improve LLM performance across different stages of development and use. Researchers and engineers working on fine-tuning, evaluating, or deploying LLMs will find this useful for understanding state-of-the-art techniques.
No commits in the last 6 months.
Use this if you are developing or researching large language models and need to explore methods for improving their alignment, reasoning, or code generation through reward-based learning.
Not ideal if you are a general user looking for pre-trained LLMs or a basic introduction to how LLMs work, as this resource is highly technical and specific to advanced LLM development.
Stars
64
Forks
2
Language
—
License
MIT
Category
Last pushed
Jun 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bobxwu/learning-from-rewards-llm-papers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...
microsoft/interwhen
A framework for verifiable reasoning with language models.
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...