CLUEbenchmark/DataCLUE

DataCLUE: 数据为中心的NLP基准和工具包

33
/ 100
Emerging

This project helps AI practitioners improve the performance of natural language processing (NLP) models by systematically enhancing the quality of their datasets. It takes a raw or labeled text dataset as input and, through iterative analysis and refinement, outputs an optimized dataset that leads to better model accuracy. This is designed for AI practitioners, data scientists, or NLP engineers focused on real-world application of AI.

144 stars. No commits in the last 6 months.

Use this if you are an AI practitioner struggling with suboptimal NLP model performance and suspect that improving your dataset quality, rather than just tweaking your model, is the key to better results.

Not ideal if you are primarily focused on developing new NLP model architectures or have datasets that are already perfectly clean and well-labeled.

natural-language-processing data-quality text-classification dataset-curation ai-performance-optimization
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

144

Forks

17

Language

Python

License

Last pushed

May 11, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CLUEbenchmark/DataCLUE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.