yaodongC/awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
This project helps AI developers and researchers access high-quality datasets for training large language models (LLMs) like ChatGPT or LLaMA to follow instructions. It provides various instruction-following datasets, including text-only and multi-modal (text and image) options, which serve as input to refine how these models generate responses. The output is a more capable LLM that can understand and execute complex prompts, making it useful for those building or fine-tuning advanced conversational AI.
1,145 stars. No commits in the last 6 months.
Use this if you are a researcher or developer aiming to improve the instruction-following capabilities of your large language models, especially for chat-based or multi-modal applications.
Not ideal if you are an end-user looking for a ready-to-use AI model, as this project provides training datasets rather than a deployable solution.
Stars
1,145
Forks
56
Language
—
License
—
Category
Last pushed
Jan 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yaodongC/awesome-instruction-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
GradientHQ/parallax
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
CLUEbenchmark/CLUE
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained...