prasanthg3/cleantext
An open-source package for python to clean raw text data
This tool helps data analysts, researchers, and anyone working with text prepare messy input for analysis. It takes raw, unstructured text โ like social media posts, customer reviews, or survey responses โ and standardizes it by removing noise like extra spaces, numbers, punctuation, and common words. The output is clean, consistent text, or a list of processed words, ready for tasks like sentiment analysis or topic modeling.
Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you need to standardize and refine raw text data to improve the accuracy of your text analysis, search functions, or machine learning models.
Not ideal if your workflow primarily involves structured data like spreadsheets or databases, or if you need to preserve every character and nuance of the original text.
Stars
75
Forks
11
Language
Python
License
MIT
Category
Last pushed
Aug 08, 2023
Commits (30d)
0
Dependencies
1
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/prasanthg3/cleantext"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
๐งน Python package for text cleaning
alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/