iaramer/dobbi
An open-source NLP library: fast text cleaning and preprocessing
When preparing text for analysis, this library helps clean raw, messy social media posts, comments, or web scraped content. It takes your unformatted text containing hashtags, emojis, URLs, and nicknames, and outputs clean, normalized text ready for further processing. This is ideal for data scientists, NLP engineers, or researchers working with user-generated content.
No commits in the last 6 months. Available on PyPI.
Use this if you need a quick and easy way to strip out noise like hashtags, URLs, emojis, and punctuation from text data.
Not ideal if your primary need is complex linguistic analysis, stemming, or lemmatization rather than just cleaning.
Stars
23
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 09, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/iaramer/dobbi"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
prasanthg3/cleantext
An open-source package for python to clean raw text data