Mxoder/Maxs-Awesome-Datasets
Max的有趣数据集 / Max's awesome datasets
This collection provides a wide array of specialized datasets to train and fine-tune large language models, particularly for Chinese-language applications. It offers diverse input data, such as coding tasks, multi-faceted summaries, and open-ended questions across various domains. The output includes structured datasets suitable for model training. This resource is primarily for AI researchers and practitioners building or improving AI models.
No commits in the last 6 months.
Use this if you need high-quality, specialized datasets for training or fine-tuning large language models, especially those with a focus on Chinese language, coding, reasoning, or domain-specific knowledge.
Not ideal if you are looking for ready-to-use AI applications or if your primary need is for datasets outside the scope of language model training, such as image or time-series data for non-NLP tasks.
Stars
68
Forks
4
Language
—
License
GPL-3.0
Category
Last pushed
Aug 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Mxoder/Maxs-Awesome-Datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aalok-sathe/surprisal
A unified interface for computing surprisal (log probabilities) from language models! Supports...
EvolvingLMMs-Lab/lmms-engine
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
reasoning-machines/pal
PaL: Program-Aided Language Models (ICML 2023)
FunnySaltyFish/Better-Ruozhiba
【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集
microsoft/monitors4codegen
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static...