yaodongC/awesome-instruction-dataset

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

/ 100

Emerging

This project helps AI developers and researchers access high-quality datasets for training large language models (LLMs) like ChatGPT or LLaMA to follow instructions. It provides various instruction-following datasets, including text-only and multi-modal (text and image) options, which serve as input to refine how these models generate responses. The output is a more capable LLM that can understand and execute complex prompts, making it useful for those building or fine-tuning advanced conversational AI.

1,145 stars. No commits in the last 6 months.

Use this if you are a researcher or developer aiming to improve the instruction-following capabilities of your large language models, especially for chat-based or multi-modal applications.

Not ideal if you are an end-user looking for a ready-to-use AI model, as this project provides training datasets rather than a deployable solution.

AI model training natural language processing machine learning research conversational AI multi-modal AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

1,145

Forks

Language

—

License

—

Higher-rated alternatives

shibing624/MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....

lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

CrazyBoyM/llama3-Chinese-chat

Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。

CLUEbenchmark/CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained...

Explore Transformer Models

All categories Trending Transformer directory Insights