sugarandgugu/Simple-Trl-Training

基于DPO算法微调语言大模型，简单好上手。

/ 100

Experimental

This project helps businesses improve their AI chatbots or large language models by training them to respond more effectively. You provide examples of good and bad chatbot responses to specific customer prompts, and the tool fine-tune the AI model to prefer the good responses. This is ideal for product managers, customer service managers, or AI developers looking to refine their conversational AI's behavior.

No commits in the last 6 months.

Use this if you have an existing large language model and want to improve its conversational quality by teaching it preferred responses to user queries.

Not ideal if you need to build a large language model from scratch or are looking for advanced features beyond preference-based fine-tuning.

AI-chatbot-training customer-service-automation large-language-model-fine-tuning conversational-AI-improvement

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights