LoLei/redditcleaner

Cleans Reddit Text Data :scroll: :broom:

38
/ 100
Emerging

When analyzing text data from Reddit, you often encounter special formatting like bolding, links, and code blocks that interfere with your analysis. This tool takes raw Reddit comments or submission self-texts, which can be full of Markdown and HTML entities, and outputs plain, readable text by removing these Reddit-specific characters. It's ideal for data scientists or researchers working with social media data.

No commits in the last 6 months. Available on PyPI.

Use this if you need to prepare Reddit text data for natural language processing or other data science tasks by stripping away Reddit-specific formatting.

Not ideal if you need to remove common punctuation, numbers, or emojis, as this tool specifically targets Reddit's unique formatting.

social-media-analysis text-mining natural-language-processing market-research online-community-research
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 25 / 25
Community 4 / 25

How are scores calculated?

Stars

83

Forks

2

Language

Python

License

MIT

Last pushed

Apr 14, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/LoLei/redditcleaner"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.