wangclnlp/DeepSpeed-Chat-Extension
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).
This project helps machine learning engineers and researchers improve the quality and safety of large language models (LLMs) by fine-tuning them with various advanced techniques. It takes your existing LLM and curated datasets (text for supervised fine-tuning, preference data for reward modeling, or conversational turns for multi-turn dialogues) and outputs a more refined, aligned, and performant LLM ready for deployment. This tool is for AI practitioners focused on building highly effective conversational AI.
No commits in the last 6 months.
Use this if you need to fine-tune large language models like LLaMA or Baichuan with state-of-the-art alignment techniques, including Direct Preference Optimization (DPO) or Reinforcement Learning from Human Feedback (RLHF), to enhance their performance on specific tasks or user preferences.
Not ideal if you are a business user looking for a no-code solution, or if you only need basic LLM inference without advanced fine-tuning capabilities.
Stars
21
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 02, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/wangclnlp/DeepSpeed-Chat-Extension"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.