JIA-Lab-research/TGDPO

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

14
/ 100
Experimental

This project offers a method to significantly improve the performance of large language models (LLMs) by integrating detailed feedback during their training. It takes existing LLM training data and pre-trained token-level reward models as input, producing an enhanced LLM that generates higher-quality text. This is designed for AI researchers and machine learning engineers who are actively working on fine-tuning and optimizing LLMs.

No commits in the last 6 months.

Use this if you are a researcher or engineer looking to boost the response quality and win rates of your fine-tuned large language models by leveraging token-level guidance.

Not ideal if you are looking for a plug-and-play solution for basic LLM deployment or if you do not have access to significant computational resources (like multiple high-end GPUs).

large-language-models LLM-fine-tuning reinforcement-learning-from-human-feedback AI-model-optimization natural-language-generation
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 7 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Python

License

Last pushed

Jul 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JIA-Lab-research/TGDPO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.