BaohaoLiao/SAGE

Self-Hinting Language Models Enhance Reinforcement Learning

/ 100

Emerging

This project helps large language model (LLM) developers fine-tune their models using Reinforcement Learning (RL) more effectively. When an LLM struggles to generate correct responses for complex prompts, it automatically creates a 'hint' to guide its sampling process. This ensures that even challenging prompts contribute to training, ultimately improving the LLM's performance and exploration capabilities.

Use this if you are a machine learning engineer or researcher focused on improving the training and performance of large language models through reinforcement learning, especially when dealing with difficult or ambiguous prompts.

Not ideal if you are an end-user simply looking to apply an existing LLM for daily tasks or if you are not involved in advanced LLM development and fine-tuning with RL.

LLM fine-tuning Reinforcement Learning for LLMs prompt engineering AI model training natural language processing

No Package No Dependents

Maintenance 13 / 25

Adoption 6 / 25

Maturity 11 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

hud-evals/hud-python

OSS RL environment + evals toolkit

hiyouga/EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

OpenRL-Lab/openrl

Unified Reinforcement Learning Framework

sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning,...

opendilab/awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Explore LLM Tools

All categories Trending LLM Tool directory Insights