jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
This project offers a complete workflow to adapt an existing language model, like Alpaca, to your specific needs using smaller datasets and affordable hardware. You provide the base model and your own data, and it outputs a fine-tuned model that behaves more like a custom chatbot. This is for AI practitioners or researchers looking to personalize large language models without extensive computational resources.
No commits in the last 6 months.
Use this if you want to create a custom, instruction-following large language model from an Alpaca base, leveraging reinforcement learning with human feedback on consumer-grade GPUs.
Not ideal if you need a solution that runs out-of-the-box on very limited memory, as loading both the base and reward models can still exceed consumer hardware limits.
Stars
61
Forks
6
Language
Python
License
MIT
Category
Last pushed
Apr 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jackaduma/Alpaca-LoRA-RLHF-PyTorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.