YugantM/textcleaner

text-data pre-processing utility

/ 100

Emerging

This tool helps data analysts and researchers prepare raw text documents for analysis by automating common cleanup tasks. It takes messy text files with irrelevant characters, numbers, blank lines, or common words, and outputs cleaned, structured text ready for further study. It's designed for anyone working with large volumes of text who needs to streamline the initial data preparation phase.

No commits in the last 6 months.

Use this if you need to quickly standardize and de-clutter text data from sources like surveys, articles, or social media posts before conducting linguistic analysis or building predictive models.

Not ideal if you require sophisticated natural language understanding capabilities or need to process complex document formats beyond plain text files.

data-preparation text-analysis market-research academic-research content-moderation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

chartbeat-labs/textacy

NLP, before and after spaCy

nltk/nltk_data

NLTK Data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

prasanthg3/cleantext

An open-source package for python to clean raw text data

Explore NLP Tools

All categories Trending NLP directory Insights