iaramer/dobbi

An open-source NLP library: fast text cleaning and preprocessing

/ 100

Emerging

When preparing text for analysis, this library helps clean raw, messy social media posts, comments, or web scraped content. It takes your unformatted text containing hashtags, emojis, URLs, and nicknames, and outputs clean, normalized text ready for further processing. This is ideal for data scientists, NLP engineers, or researchers working with user-generated content.

No commits in the last 6 months. Available on PyPI.

Use this if you need a quick and easy way to strip out noise like hashtags, URLs, emojis, and punctuation from text data.

Not ideal if your primary need is complex linguistic analysis, stemming, or lemmatization rather than just cleaning.

text-preprocessing social-media-analysis data-cleaning natural-language-processing text-normalization

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 25 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

chartbeat-labs/textacy

NLP, before and after spaCy

nltk/nltk_data

NLTK Data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

prasanthg3/cleantext

An open-source package for python to clean raw text data

Explore NLP Tools

All categories Trending NLP directory Insights