Mxoder/Maxs-Awesome-Datasets

Max的有趣数据集 / Max's awesome datasets

33
/ 100
Emerging

This collection provides a wide array of specialized datasets to train and fine-tune large language models, particularly for Chinese-language applications. It offers diverse input data, such as coding tasks, multi-faceted summaries, and open-ended questions across various domains. The output includes structured datasets suitable for model training. This resource is primarily for AI researchers and practitioners building or improving AI models.

No commits in the last 6 months.

Use this if you need high-quality, specialized datasets for training or fine-tuning large language models, especially those with a focus on Chinese language, coding, reasoning, or domain-specific knowledge.

Not ideal if you are looking for ready-to-use AI applications or if your primary need is for datasets outside the scope of language model training, such as image or time-series data for non-NLP tasks.

AI model training Natural Language Processing Large Language Models Machine Learning Datasets Chinese language AI
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 8 / 25

How are scores calculated?

Stars

68

Forks

4

Language

License

GPL-3.0

Last pushed

Aug 29, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Mxoder/Maxs-Awesome-Datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.