prasanthg3/cleantext

An open-source package for python to clean raw text data

49
/ 100
Emerging

This tool helps data analysts, researchers, and anyone working with text prepare messy input for analysis. It takes raw, unstructured text โ€“ like social media posts, customer reviews, or survey responses โ€“ and standardizes it by removing noise like extra spaces, numbers, punctuation, and common words. The output is clean, consistent text, or a list of processed words, ready for tasks like sentiment analysis or topic modeling.

Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to standardize and refine raw text data to improve the accuracy of your text analysis, search functions, or machine learning models.

Not ideal if your workflow primarily involves structured data like spreadsheets or databases, or if you need to preserve every character and nuance of the original text.

text-analysis natural-language-processing data-preparation information-extraction qualitative-research
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 14 / 25

How are scores calculated?

Stars

75

Forks

11

Language

Python

License

MIT

Last pushed

Aug 08, 2023

Commits (30d)

0

Dependencies

1

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/prasanthg3/cleantext"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.