l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

40
/ 100
Emerging

This project helps fine-tune large language models (LLMs) like LLaMA to better align with human preferences. It takes an existing LLaMA model and human feedback data, and outputs a more helpful and harmless conversational AI model. This is for AI developers, researchers, and engineers working on building or improving custom chat applications and intelligent assistants.

117 stars. No commits in the last 6 months.

Use this if you want to enhance the quality and safety of a LLaMA-based conversational AI by incorporating human feedback during the training process.

Not ideal if you are looking for a pre-trained, ready-to-use chatbot without needing to perform custom model training and finetuning.

conversational-ai LLM-finetuning AI-safety natural-language-processing AI-model-development
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

117

Forks

15

Language

Python

License

MIT

Last pushed

Jun 05, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/l294265421/alpaca-rlhf"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.