LunjunZhang/ema-pg
Code for "EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL" (arxiv.org/abs/2602.04417)
This project offers techniques to improve how Large Language Models (LLMs) learn complex reasoning and act like intelligent agents through reinforcement learning. It takes an existing LLM and training data, and outputs a more capable LLM that performs better on tasks like math reasoning and information retrieval. This is for researchers and engineers who are fine-tuning LLMs for advanced capabilities.
Use this if you are actively training Large Language Models with reinforcement learning and want to enhance their performance on reasoning or agentic tasks.
Not ideal if you are looking for an out-of-the-box LLM without needing to perform advanced reinforcement learning fine-tuning.
Stars
8
Forks
1
Language
Python
License
MIT
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/LunjunZhang/ema-pg"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.