ghomasHudson/muld

The Multitask Long Document Benchmark

/ 100

Experimental

MuLD is a collection of extensive text datasets designed to help evaluate and compare natural language processing (NLP) models. It provides various tasks like summarization, translation, question answering, and text classification using very long documents (over 10,000 words). Researchers and developers building and testing advanced NLP models for complex, lengthy texts would use this.

No commits in the last 6 months.

Use this if you are developing or evaluating NLP models and need a comprehensive benchmark with diverse tasks based on exceptionally long documents.

Not ideal if you are looking for a dataset for short-form text tasks or if you are not involved in advanced NLP model development.

natural-language-processing long-document-analysis text-summarization machine-translation question-answering

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

loomchild/maligna

Bilingual sengence aligner

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Explore NLP Tools

All categories Trending NLP directory Insights