CLUEbenchmark/DataCLUE
DataCLUE: 数据为中心的NLP基准和工具包
This project helps AI practitioners improve the performance of natural language processing (NLP) models by systematically enhancing the quality of their datasets. It takes a raw or labeled text dataset as input and, through iterative analysis and refinement, outputs an optimized dataset that leads to better model accuracy. This is designed for AI practitioners, data scientists, or NLP engineers focused on real-world application of AI.
144 stars. No commits in the last 6 months.
Use this if you are an AI practitioner struggling with suboptimal NLP model performance and suspect that improving your dataset quality, rather than just tweaking your model, is the key to better results.
Not ideal if you are primarily focused on developing new NLP model architectures or have datasets that are already perfectly clean and well-labeled.
Stars
144
Forks
17
Language
Python
License
—
Category
Last pushed
May 11, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CLUEbenchmark/DataCLUE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
RichardLitt/low-resource-languages
Resources for conservation, development, and documentation of low resource (human) languages.
UCREL/pymusas-models
PyMUSAS Models
ksopyla/awesome-nlp-polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models,...
datanada/Awesome-Korean-NLP
A curated list of resources for NLP (Natural Language Processing) for Korean