YugantM/textcleaner
text-data pre-processing utility
This tool helps data analysts and researchers prepare raw text documents for analysis by automating common cleanup tasks. It takes messy text files with irrelevant characters, numbers, blank lines, or common words, and outputs cleaned, structured text ready for further study. It's designed for anyone working with large volumes of text who needs to streamline the initial data preparation phase.
No commits in the last 6 months.
Use this if you need to quickly standardize and de-clutter text data from sources like surveys, articles, or social media posts before conducting linguistic analysis or building predictive models.
Not ideal if you require sophisticated natural language understanding capabilities or need to process complex document formats beyond plain text files.
Stars
13
Forks
3
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jun 30, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/YugantM/textcleaner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
prasanthg3/cleantext
An open-source package for python to clean raw text data