hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

/ 100

Established

This project is for AI researchers or students who want to understand the core mechanics of training large language models (LLMs) from the ground up, specifically focusing on Reinforcement Learning from Human Feedback (RLHF). It provides a simplified, educational implementation of various components, taking raw data and producing a fine-tuned LLM. The target users are individuals or small teams looking to gain a deep, practical understanding of LLM training and optimization techniques without the complexity of large-scale production systems.

168 stars. Available on PyPI.

Use this if you are an AI researcher, student, or enthusiast keen on learning how LLMs and RLHF truly work by building and experimenting with simplified, functional components.

Not ideal if you need a production-ready, highly efficient framework for training large-scale LLMs or if you are only interested in applying existing models without understanding their internal workings.

LLM training AI education Reinforcement Learning Deep learning engineering Model understanding

Maintenance 10 / 25

Adoption 10 / 25

Maturity 24 / 25

Community 12 / 25

How are scores calculated?

Stars

168

Forks

Language

Python

License

Apache-2.0

Related models

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

Explore Transformer Models

All categories Trending Transformer directory Insights