NIHOPA/NLPre
Python library for Natural Language Preprocessing (NLPre)
When preparing textual data for analysis, you often encounter inconsistencies like odd capitalization, strange hyphenations, or abbreviations that make the text harder to process. This tool helps clean up these issues, taking raw, messy text and outputting a standardized, cleaned version. It's designed for researchers, analysts, or anyone working with large volumes of text data who needs to ensure consistency for downstream tasks like topic modeling or information extraction.
191 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to standardize and clean free-form text data, such as scientific abstracts, survey responses, or medical notes, before conducting natural language processing or text mining.
Not ideal if you primarily work with highly structured text or only need basic text manipulation like simple string replacement.
Stars
191
Forks
36
Language
Python
License
—
Category
Last pushed
Jul 31, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/NIHOPA/NLPre"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
sloria/TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...
cltk/cltk
The Classical Language Toolkit
allenai/scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
wi2trier/cbrkit
Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.