iaramer/dobbi

An open-source NLP library: fast text cleaning and preprocessing

31
/ 100
Emerging

When preparing text for analysis, this library helps clean raw, messy social media posts, comments, or web scraped content. It takes your unformatted text containing hashtags, emojis, URLs, and nicknames, and outputs clean, normalized text ready for further processing. This is ideal for data scientists, NLP engineers, or researchers working with user-generated content.

No commits in the last 6 months. Available on PyPI.

Use this if you need a quick and easy way to strip out noise like hashtags, URLs, emojis, and punctuation from text data.

Not ideal if your primary need is complex linguistic analysis, stemming, or lemmatization rather than just cleaning.

text-preprocessing social-media-analysis data-cleaning natural-language-processing text-normalization
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 25 / 25
Community 0 / 25

How are scores calculated?

Stars

23

Forks

Language

Python

License

Apache-2.0

Last pushed

Nov 09, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/iaramer/dobbi"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.