prasanthg3/cleantext

An open-source package for python to clean raw text data

/ 100

Emerging

This tool helps data analysts, researchers, and anyone working with text prepare messy input for analysis. It takes raw, unstructured text – like social media posts, customer reviews, or survey responses – and standardizes it by removing noise like extra spaces, numbers, punctuation, and common words. The output is clean, consistent text, or a list of processed words, ready for tasks like sentiment analysis or topic modeling.

Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to standardize and refine raw text data to improve the accuracy of your text analysis, search functions, or machine learning models.

Not ideal if your workflow primarily involves structured data like spreadsheets or databases, or if you need to preserve every character and nuance of the original text.

text-analysis natural-language-processing data-preparation information-extraction qualitative-research

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Compare

cleantext and clean-text

Higher-rated alternatives

chartbeat-labs/textacy

NLP, before and after spaCy

nltk/nltk_data

NLTK Data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

alinapetukhova/textcl

Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/

Explore NLP Tools

All categories Trending NLP directory Insights