BaohaoLiao/SAGE
Self-Hinting Language Models Enhance Reinforcement Learning
This project helps large language model (LLM) developers fine-tune their models using Reinforcement Learning (RL) more effectively. When an LLM struggles to generate correct responses for complex prompts, it automatically creates a 'hint' to guide its sampling process. This ensures that even challenging prompts contribute to training, ultimately improving the LLM's performance and exploration capabilities.
Use this if you are a machine learning engineer or researcher focused on improving the training and performance of large language models through reinforcement learning, especially when dealing with difficult or ambiguous prompts.
Not ideal if you are an end-user simply looking to apply an existing LLM for daily tasks or if you are not involved in advanced LLM development and fine-tuning with RL.
Stars
24
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/BaohaoLiao/SAGE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hud-evals/hud-python
OSS RL environment + evals toolkit
hiyouga/EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
OpenRL-Lab/openrl
Unified Reinforcement Learning Framework
sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning,...
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)