rkinas/reasoning_models_how_to

This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.

/ 100

Emerging

This collection of research notes and resources helps AI researchers and engineers understand and implement the latest techniques for training large language models (LLMs) and applying Reinforcement Learning from Human Feedback (RLHF). It compiles academic papers, video lectures, and practical implementations, allowing users to efficiently learn and apply methods like PPO, DPO, and KTO to refine their language models. The primary users are professionals working on model alignment and LLM fine-tuning.

132 stars. No commits in the last 6 months.

Use this if you are an AI researcher or engineer focused on improving large language model performance through advanced training methods, especially those involving human feedback.

Not ideal if you are looking for a plug-and-play solution to use an existing LLM without delving into its core training and alignment methodologies.

LLM training Reinforcement Learning Model alignment Deep Learning Research AI engineering

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

132

Forks

Language

Python

License

—

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights