TakeLab/podium

Podium: a framework agnostic Python NLP library for data loading and preprocessing

29
/ 100
Experimental

This tool helps machine learning engineers and data scientists efficiently prepare text data for training natural language processing (NLP) models. It takes raw text from various sources like CSV files or popular NLP datasets, processes it according to custom rules, and outputs structured, cleaned, and tokenized data ready for model ingestion. It's designed for anyone building NLP applications who needs robust, flexible control over their text data pipeline.

No commits in the last 6 months.

Use this if you need a lightweight and flexible way to load and preprocess diverse text datasets for training custom NLP models, especially if you want to integrate with existing Hugging Face models or define specific text cleaning steps.

Not ideal if you primarily work with pre-built, end-to-end NLP solutions and don't require fine-grained control over data preparation or custom model development.

natural-language-processing text-analysis machine-learning-engineering data-preparation model-training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

60

Forks

2

Language

Python

License

BSD-3-Clause

Last pushed

Dec 12, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/TakeLab/podium"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.