jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
This project offers a comprehensive collection of datasets specifically designed for training conversational AI models like ChatGPT or LLaMA. It provides a curated list of prompt and instruction datasets, as well as those for Reinforcement Learning from Human Feedback (RLHF). Machine learning engineers and AI researchers can use these datasets to fine-tune their large language models to better understand and follow instructions, leading to more natural and effective AI conversations.
725 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher developing or improving chat-based AI models and need a variety of instruction-following datasets.
Not ideal if you are looking for ready-to-use, pre-trained AI models or tools for general data analysis.
Stars
725
Forks
40
Language
—
License
Apache-2.0
Category
Last pushed
Apr 07, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jianzhnie/awesome-instruction-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
GradientHQ/parallax
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
CLUEbenchmark/CLUE
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained...