NIHOPA/NLPre

Python library for Natural Language Preprocessing (NLPre)

/ 100

Emerging

When preparing textual data for analysis, you often encounter inconsistencies like odd capitalization, strange hyphenations, or abbreviations that make the text harder to process. This tool helps clean up these issues, taking raw, messy text and outputting a standardized, cleaned version. It's designed for researchers, analysts, or anyone working with large volumes of text data who needs to ensure consistency for downstream tasks like topic modeling or information extraction.

191 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to standardize and clean free-form text data, such as scientific abstracts, survey responses, or medical notes, before conducting natural language processing or text mining.

Not ideal if you primarily work with highly structured text or only need basic text manipulation like simple string replacement.

text-analysis data-preparation scientific-research information-extraction medical-text

No License Stale 6m No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 17 / 25

Community 20 / 25

How are scores calculated?

Stars

191

Forks

Language

Python

License

—

Compare

NLPre and prenlp

Higher-rated alternatives

sloria/TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...

chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...

cltk/cltk

The Classical Language Toolkit

allenai/scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

wi2trier/cbrkit

Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.

Explore NLP Tools

All categories Trending NLP directory Insights