Wluper/edm

Python package for understanding the difficulty of text classification datasets. (in CoNNL 2018)

48
/ 100
Emerging

This tool helps machine learning practitioners or researchers understand how challenging a text classification dataset will be for models to learn. You provide lists of sentences and their corresponding labels, and it outputs a 'difficulty report' that quantifies the inherent complexity of your dataset. This is useful for anyone working with text data, such as sentiment analysis, topic modeling, or spam detection, who wants to assess dataset quality before extensive model training.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly assess the intrinsic difficulty of a new or existing text classification dataset to set appropriate expectations for model performance or diagnose potential issues.

Not ideal if you're looking for a tool that loads data files (like CSVs) directly or helps with the actual training or evaluation of machine learning models.

text-classification dataset-analysis natural-language-processing machine-learning-engineering data-science
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

64

Forks

10

Language

Python

License

GPL-2.0

Last pushed

Feb 13, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Wluper/edm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.