bobxwu/learning-from-rewards-llm-papers

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

30
/ 100
Emerging

This is a curated collection of research papers focused on how Large Language Models (LLMs) learn from rewards during and after their initial training. It organizes various methods for using 'reward models' and different learning strategies to improve LLM performance across different stages of development and use. Researchers and engineers working on fine-tuning, evaluating, or deploying LLMs will find this useful for understanding state-of-the-art techniques.

No commits in the last 6 months.

Use this if you are developing or researching large language models and need to explore methods for improving their alignment, reasoning, or code generation through reward-based learning.

Not ideal if you are a general user looking for pre-trained LLMs or a basic introduction to how LLMs work, as this resource is highly technical and specific to advanced LLM development.

LLM development AI alignment reinforcement learning natural language processing machine learning research
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 5 / 25

How are scores calculated?

Stars

64

Forks

2

Language

License

MIT

Last pushed

Jun 13, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bobxwu/learning-from-rewards-llm-papers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.