Mxoder/Maxs-Awesome-Datasets

Max的有趣数据集 / Max's awesome datasets

/ 100

Emerging

This collection provides a wide array of specialized datasets to train and fine-tune large language models, particularly for Chinese-language applications. It offers diverse input data, such as coding tasks, multi-faceted summaries, and open-ended questions across various domains. The output includes structured datasets suitable for model training. This resource is primarily for AI researchers and practitioners building or improving AI models.

No commits in the last 6 months.

Use this if you need high-quality, specialized datasets for training or fine-tuning large language models, especially those with a focus on Chinese language, coding, reasoning, or domain-specific knowledge.

Not ideal if you are looking for ready-to-use AI applications or if your primary need is for datasets outside the scope of language model training, such as image or time-series data for non-NLP tasks.

AI model training Natural Language Processing Large Language Models Machine Learning Datasets Chinese language AI

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 15 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

—

License

GPL-3.0

Higher-rated alternatives

aalok-sathe/surprisal

A unified interface for computing surprisal (log probabilities) from language models! Supports...

EvolvingLMMs-Lab/lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

reasoning-machines/pal

PaL: Program-Aided Language Models (ICML 2023)

FunnySaltyFish/Better-Ruozhiba

【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集

microsoft/monitors4codegen

Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static...

Explore LLM Tools

All categories Trending LLM Tool directory Insights