jackaduma/Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

/ 100

Emerging

This project offers a complete workflow to adapt an existing language model, like Alpaca, to your specific needs using smaller datasets and affordable hardware. You provide the base model and your own data, and it outputs a fine-tuned model that behaves more like a custom chatbot. This is for AI practitioners or researchers looking to personalize large language models without extensive computational resources.

No commits in the last 6 months.

Use this if you want to create a custom, instruction-following large language model from an Alpaca base, leveraging reinforcement learning with human feedback on consumer-grade GPUs.

Not ideal if you need a solution that runs out-of-the-box on very limited memory, as loading both the base and reward models can still exceed consumer hardware limits.

large-language-models model-fine-tuning conversational-ai ai-research natural-language-processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights