voidful/awesome-chatgpt-dataset
Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!
This project helps AI developers and researchers find, combine, and prepare diverse datasets for training their own custom large language models (LLMs) like ChatGPT. It provides a curated list of datasets, ranging from small to large, covering various topics, languages, and use cases. Developers can select datasets, merge them, and easily upload the processed data to platforms like HuggingFace Hub to enhance their LLM training workflows.
763 stars.
Use this if you are developing or fine-tuning a large language model and need a readily available, categorized collection of diverse datasets for instruction tuning, safety training, or specialized domain adaptation.
Not ideal if you are looking for a pre-trained, ready-to-use LLM or if your project doesn't involve training or fine-tuning language models.
Stars
763
Forks
63
Language
Python
License
GPL-3.0
Category
Last pushed
Oct 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/voidful/awesome-chatgpt-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
awesome-gptX/awesome-gpt
🏆 An awe-inspiring collection of resources, encompassing a wide range of tools, documents,...
taishi-i/awesome-ChatGPT-repositories
A curated list of resources dedicated to open source GitHub repositories related to ChatGPT,...
friuns2/BlackFriday-GPTs-Prompts
List of free GPTs that doesn't require plus subscription
eon01/awesome-chatgpt
🧠 A curated list of awesome ChatGPT resources, including libraries, SDKs, APIs, and more. 🌟...
sindresorhus/awesome-chatgpt
🤖 Awesome list for ChatGPT — an artificial intelligence chatbot developed by OpenAI