huu4ontocord/rio
Text pre-processing for NLP datasets
This tool helps data scientists and NLP practitioners prepare and enhance text datasets for training natural language processing models. You input raw text data, and it outputs cleaned, filtered, or augmented text, making your datasets more robust and diverse for better model performance. It's designed for anyone building or improving NLP models who needs to refine their training data.
No commits in the last 6 months.
Use this if you need to pre-process, filter, or augment text datasets, particularly using backtranslation or round-trip translation techniques, for training NLP models.
Not ideal if you require Personally Identifiable Information (PII) processing or anonymization, as this functionality has been removed and lives in a separate project.
Stars
12
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 26, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/huu4ontocord/rio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
prasanthg3/cleantext
An open-source package for python to clean raw text data