Wluper/edm

Python package for understanding the difficulty of text classification datasets. (in CoNNL 2018)

/ 100

Emerging

This tool helps machine learning practitioners or researchers understand how challenging a text classification dataset will be for models to learn. You provide lists of sentences and their corresponding labels, and it outputs a 'difficulty report' that quantifies the inherent complexity of your dataset. This is useful for anyone working with text data, such as sentiment analysis, topic modeling, or spam detection, who wants to assess dataset quality before extensive model training.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly assess the intrinsic difficulty of a new or existing text classification dataset to set appropriate expectations for model performance or diagnose potential issues.

Not ideal if you're looking for a tool that loads data files (like CSVs) directly or helps with the actual training or evaluation of machine learning models.

text-classification dataset-analysis natural-language-processing machine-learning-engineering data-science

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-2.0

Higher-rated alternatives

giacbrd/ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network...

javedsha/text-classification

Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK

chicago-justice-project/article-tagging

Natural Language Processing of Chicago news articles

fendouai/Awesome-Text-Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

opennlp/Large-Scale-Text-Classification

Large Scale benchmarking of state of the art text vectorizers

Explore NLP Tools

All categories Trending NLP directory Insights