lgomezt/tidyX

Python package to clean raw tweets for ML applications.

/ 100

Experimental

This tool helps researchers, marketers, or analysts transform messy, raw text, especially from social media platforms like Twitter and particularly in Spanish, into clean, structured data ready for analysis. It takes in tweets and other short-form text and outputs a streamlined version, free of noise like URLs, hashtags, and emojis, making it ideal for natural language processing applications. Anyone working with social media data who needs to prepare it for sentiment analysis, topic modeling, or other text-based insights would find this valuable.

No commits in the last 6 months.

Use this if you need to quickly and efficiently clean social media text, especially Spanish tweets, to prepare it for machine learning or other analytical tasks.

Not ideal if your primary need is for deep linguistic analysis or processing highly structured, formal text datasets outside of social media.

social-media-analytics text-mining sentiment-analysis market-research public-opinion-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

chartbeat-labs/textacy

NLP, before and after spaCy

nltk/nltk_data

NLTK Data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

prasanthg3/cleantext

An open-source package for python to clean raw text data

Explore NLP Tools

All categories Trending NLP directory Insights