lyeoni/prenlp

Preprocessing Library for Natural Language Processing

/ 100

Emerging

This tool helps data scientists and NLP practitioners prepare raw text for analysis or machine learning. It takes uncleaned text data (like social media posts, articles, or reviews) and converts it into a standardized, tokenized format that's ready for tasks like sentiment analysis or language modeling. It also includes popular English and Korean datasets for common NLP benchmarks.

164 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly clean and tokenize text data for natural language processing tasks, especially if you're working with English or Korean content.

Not ideal if your primary need is advanced linguistic analysis or if your data requires highly specialized, domain-specific preprocessing rules not covered by common normalization.

text-mining sentiment-analysis language-modeling korean-nlp data-preparation

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 12 / 25

How are scores calculated?

Stars

164

Forks

Language

Python

License

Apache-2.0

Compare

prenlp and NLPre

Higher-rated alternatives

sloria/TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...

chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...

cltk/cltk

The Classical Language Toolkit

allenai/scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

wi2trier/cbrkit

Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.

Explore NLP Tools

All categories Trending NLP directory Insights