minnesotanlp/infoVerse

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"

27
/ 100
Experimental

This tool helps machine learning engineers and researchers deeply understand and characterize their natural language processing (NLP) datasets. It takes your existing text datasets, processes them through various classifiers, and generates a 'meta-information' profile. This profile provides insights into dataset characteristics like complexity and diversity, which can then be used to inform decisions about data quality and model training.

No commits in the last 6 months.

Use this if you need to comprehensively analyze the properties of your NLP datasets to make informed decisions about data pruning, active learning strategies, or data annotation efforts.

Not ideal if you are looking for a simple data cleaning tool or if your primary goal is to train a model without needing deep insights into dataset characteristics.

NLP dataset analysis Machine learning engineering Data quality assessment Text data characterization AI research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

16

Forks

1

Language

Python

License

MIT

Last pushed

Jun 28, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/minnesotanlp/infoVerse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.