sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

48
/ 100
Emerging

This project helps machine learning engineers and researchers easily access and prepare a vast collection of NLP datasets for advanced model training. It takes raw text datasets and standardizes them into consistent formats (like multiple choice or classification tasks), making them instantly interchangeable. The ideal user is someone building or evaluating large language models who needs a wide range of consistently preprocessed data.

193 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need a standardized, large collection of NLP datasets ready for immediate use in multi-task learning, fine-tuning, or evaluating advanced text models.

Not ideal if you are a casual user looking for a simple, single dataset for a basic NLP task or if you lack disk space for large datasets.

natural-language-processing machine-learning-engineering text-classification multi-task-learning model-evaluation
Stale 6m
Maintenance 2 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 10 / 25

How are scores calculated?

Stars

193

Forks

11

Language

Python

License

CC-BY-4.0

Last pushed

Jul 09, 2025

Commits (30d)

0

Dependencies

9

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sileod/tasksource"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.