voidful/awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!

48
/ 100
Emerging

This project helps AI developers and researchers find, combine, and prepare diverse datasets for training their own custom large language models (LLMs) like ChatGPT. It provides a curated list of datasets, ranging from small to large, covering various topics, languages, and use cases. Developers can select datasets, merge them, and easily upload the processed data to platforms like HuggingFace Hub to enhance their LLM training workflows.

763 stars.

Use this if you are developing or fine-tuning a large language model and need a readily available, categorized collection of diverse datasets for instruction tuning, safety training, or specialized domain adaptation.

Not ideal if you are looking for a pre-trained, ready-to-use LLM or if your project doesn't involve training or fine-tuning language models.

AI-development LLM-training natural-language-processing dataset-curation machine-learning-engineering
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

763

Forks

63

Language

Python

License

GPL-3.0

Last pushed

Oct 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/voidful/awesome-chatgpt-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.