yaodongC/awesome-instruction-dataset

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

32
/ 100
Emerging

This project helps AI developers and researchers access high-quality datasets for training large language models (LLMs) like ChatGPT or LLaMA to follow instructions. It provides various instruction-following datasets, including text-only and multi-modal (text and image) options, which serve as input to refine how these models generate responses. The output is a more capable LLM that can understand and execute complex prompts, making it useful for those building or fine-tuning advanced conversational AI.

1,145 stars. No commits in the last 6 months.

Use this if you are a researcher or developer aiming to improve the instruction-following capabilities of your large language models, especially for chat-based or multi-modal applications.

Not ideal if you are an end-user looking for a ready-to-use AI model, as this project provides training datasets rather than a deployable solution.

AI model training natural language processing machine learning research conversational AI multi-modal AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

1,145

Forks

56

Language

License

Last pushed

Jan 04, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yaodongC/awesome-instruction-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.