CJReinforce/PURE

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"

/ 100

Emerging

This project helps AI researchers and machine learning engineers fine-tune large language models (LLMs) to improve their reasoning abilities, particularly for complex mathematical problems. It takes an existing LLM and a dataset of mathematical prompts with process rewards, then outputs a more accurate and efficient LLM for solving reasoning tasks. The end-user is typically an expert working on advanced AI model development.

160 stars.

Use this if you are developing highly capable LLMs for reasoning tasks and need to efficiently fine-tune them using process-supervised or verifiable rewards to achieve state-of-the-art accuracy with fewer resources.

Not ideal if you are looking for a pre-trained, off-the-shelf LLM or if your primary goal is not to advance reasoning capabilities through novel fine-tuning techniques.

AI model development LLM fine-tuning reasoning AI machine learning research natural language processing

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

160

Forks

Language

Python

License

—

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights