voidful/awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!

/ 100

Emerging

This project helps AI developers and researchers find, combine, and prepare diverse datasets for training their own custom large language models (LLMs) like ChatGPT. It provides a curated list of datasets, ranging from small to large, covering various topics, languages, and use cases. Developers can select datasets, merge them, and easily upload the processed data to platforms like HuggingFace Hub to enhance their LLM training workflows.

763 stars.

Use this if you are developing or fine-tuning a large language model and need a readily available, categorized collection of diverse datasets for instruction tuning, safety training, or specialized domain adaptation.

Not ideal if you are looking for a pre-trained, ready-to-use LLM or if your project doesn't involve training or fine-tuning language models.

AI-development LLM-training natural-language-processing dataset-curation machine-learning-engineering

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

763

Forks

Language

Python

License

GPL-3.0

Compare

awesome-chatgpt-dataset and awesome-chatgpt

Higher-rated alternatives

awesome-gptX/awesome-gpt

🏆 An awe-inspiring collection of resources, encompassing a wide range of tools, documents,...

taishi-i/awesome-ChatGPT-repositories

A curated list of resources dedicated to open source GitHub repositories related to ChatGPT,...

friuns2/BlackFriday-GPTs-Prompts

List of free GPTs that doesn't require plus subscription

eon01/awesome-chatgpt

🧠 A curated list of awesome ChatGPT resources, including libraries, SDKs, APIs, and more. 🌟...

sindresorhus/awesome-chatgpt

🤖 Awesome list for ChatGPT — an artificial intelligence chatbot developed by OpenAI

Explore LLM Tools

All categories Trending LLM Tool directory Insights