CLUEbenchmark/DataCLUE

DataCLUE: 数据为中心的NLP基准和工具包

/ 100

Emerging

This project helps AI practitioners improve the performance of natural language processing (NLP) models by systematically enhancing the quality of their datasets. It takes a raw or labeled text dataset as input and, through iterative analysis and refinement, outputs an optimized dataset that leads to better model accuracy. This is designed for AI practitioners, data scientists, or NLP engineers focused on real-world application of AI.

144 stars. No commits in the last 6 months.

Use this if you are an AI practitioner struggling with suboptimal NLP model performance and suspect that improving your dataset quality, rather than just tweaking your model, is the key to better results.

Not ideal if you are primarily focused on developing new NLP model architectures or have datasets that are already perfectly clean and well-labeled.

natural-language-processing data-quality text-classification dataset-curation ai-performance-optimization

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

144

Forks

Language

Python

License

—

Higher-rated alternatives

gentaiscool/code-switching-papers

A curated list of research papers and resources on code-switching

RichardLitt/low-resource-languages

Resources for conservation, development, and documentation of low resource (human) languages.

UCREL/pymusas-models

PyMUSAS Models

ksopyla/awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models,...

datanada/Awesome-Korean-NLP

A curated list of resources for NLP (Natural Language Processing) for Korean

Explore NLP Tools

All categories Trending NLP directory Insights