lyeoni/prenlp
Preprocessing Library for Natural Language Processing
This tool helps data scientists and NLP practitioners prepare raw text for analysis or machine learning. It takes uncleaned text data (like social media posts, articles, or reviews) and converts it into a standardized, tokenized format that's ready for tasks like sentiment analysis or language modeling. It also includes popular English and Korean datasets for common NLP benchmarks.
164 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly clean and tokenize text data for natural language processing tasks, especially if you're working with English or Korean content.
Not ideal if your primary need is advanced linguistic analysis or if your data requires highly specialized, domain-specific preprocessing rules not covered by common normalization.
Stars
164
Forks
12
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 06, 2022
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lyeoni/prenlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
sloria/TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...
cltk/cltk
The Classical Language Toolkit
allenai/scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
wi2trier/cbrkit
Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.