Wluper/edm
Python package for understanding the difficulty of text classification datasets. (in CoNNL 2018)
This tool helps machine learning practitioners or researchers understand how challenging a text classification dataset will be for models to learn. You provide lists of sentences and their corresponding labels, and it outputs a 'difficulty report' that quantifies the inherent complexity of your dataset. This is useful for anyone working with text data, such as sentiment analysis, topic modeling, or spam detection, who wants to assess dataset quality before extensive model training.
No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly assess the intrinsic difficulty of a new or existing text classification dataset to set appropriate expectations for model performance or diagnose potential issues.
Not ideal if you're looking for a tool that loads data files (like CSVs) directly or helps with the actual training or evaluation of machine learning models.
Stars
64
Forks
10
Language
Python
License
GPL-2.0
Category
Last pushed
Feb 13, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Wluper/edm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
giacbrd/ShallowLearn
An experiment about re-implementing supervised learning models based on shallow neural network...
javedsha/text-classification
Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK
chicago-justice-project/article-tagging
Natural Language Processing of Chicago news articles
fendouai/Awesome-Text-Classification
Awesome-Text-Classification Projects,Papers,Tutorial .
opennlp/Large-Scale-Text-Classification
Large Scale benchmarking of state of the art text vectorizers