Text Preprocessing Pipelines NLP Tools
End-to-end tools and libraries for cleaning, normalizing, and preparing raw text data for NLP tasks. Includes tokenization, stemming, stopword removal, and data cleaning utilities. Does NOT include downstream NLP applications (sentiment analysis, classification, etc.), feature extraction, or domain-specific cleaning (tweets, names, etc.).
There are 45 text preprocessing pipelines tools tracked. 4 score above 50 (established tier). The highest-rated is chartbeat-labs/textacy at 60/100 with 2,236 stars.
Get all 45 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-preprocessing-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
chartbeat-labs/textacy
NLP, before and after spaCy |
|
Established |
| 2 |
nltk/nltk_data
NLTK Data |
|
Established |
| 3 |
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN |
|
Established |
| 4 |
jfilter/clean-text
🧹 Python package for text cleaning |
|
Established |
| 5 |
prasanthg3/cleantext
An open-source package for python to clean raw text data |
|
Emerging |
| 6 |
alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/ |
|
Emerging |
| 7 |
takuti/prelims
Front matter post-processor for static site generators |
|
Emerging |
| 8 |
ksnugroho/basic-text-preprocessing
Basic text preprocessing for Bahasa with Python. |
|
Emerging |
| 9 |
textpipe/textpipe
Textpipe: clean and extract metadata from text |
|
Emerging |
| 10 |
citiususc/pyplexity
Cleaning tool for web scraped text |
|
Emerging |
| 11 |
MusfiqDehan/data-preprocessors
🛠️An easy to use tool for Data Preprocessing specially for Text Preprocessing |
|
Emerging |
| 12 |
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom: |
|
Emerging |
| 13 |
huu4ontocord/rio
Text pre-processing for NLP datasets |
|
Emerging |
| 14 |
Shubha23/Text-processing-NLP
This notebook contains entire text preprocessing pipeline for NLP problems.... |
|
Emerging |
| 15 |
YugantM/textcleaner
text-data pre-processing utility |
|
Emerging |
| 16 |
Abhayparashar31/crazytext
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can... |
|
Emerging |
| 17 |
Arfius/light-text-prepro
Python module that collects regex rules |
|
Emerging |
| 18 |
mantzaris/KeemenaPreprocessing.jl
Preprocessing for text data: cleaning, normalization, vectorization,... |
|
Emerging |
| 19 |
iaramer/dobbi
An open-source NLP library: fast text cleaning and preprocessing |
|
Emerging |
| 20 |
Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset... |
|
Emerging |
| 21 |
ninadpatil09/NLP-Notebooks
Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn:... |
|
Emerging |
| 22 |
aflah02/cleansetext
This is a simple library to help you clean your textual data |
|
Experimental |
| 23 |
lgomezt/tidyX
Python package to clean raw tweets for ML applications. |
|
Experimental |
| 24 |
umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing. |
|
Experimental |
| 25 |
Al-Hasib/eng_text_cleaner
A python package for cleaning text |
|
Experimental |
| 26 |
krisograbek/text-preprocessing
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim,... |
|
Experimental |
| 27 |
abeaderstadt/nlp-02-text-preprocessing
Text Preprocessing NLP Project |
|
Experimental |
| 28 |
NITHISHM2410/text-preprocessing-techniques
This Repo includes modules that helps NLP related tasks. |
|
Experimental |
| 29 |
basit-afridi62/nlp-nltk-python
This repository is a hands-on guide to Natural Language Processing (NLP)... |
|
Experimental |
| 30 |
angelsomo/nlp-text-cleaning
Lightweight Python CLI tool for robust text cleaning, Unicode normalization,... |
|
Experimental |
| 31 |
iam-salma/NLP-Bootcamp-with-python
A hands-on NLP Bootcamp using Python covering text preprocessing,... |
|
Experimental |
| 32 |
Abdelrahman-Atef-Elsayed/NLP_Preprocessing_pipeline
This repo includes a generalized preprocessing pipeline for text data in NLP tasks. |
|
Experimental |
| 33 |
MariyamSiddiqui/Text-Preprocessing-NLP-pipeline
End-to-end NLP text preprocessing pipeline using Python — includes... |
|
Experimental |
| 34 |
shrutimary15/Text-data-preparation
The repository consists of a python code that inputs a text file consisting... |
|
Experimental |
| 35 |
tripathiadityap/cleantxty
Python package to clean strings and making them reasonable for NLP. |
|
Experimental |
| 36 |
nadinejackson1/text-preprocessing-pipeline
Basic text preprocessing pipeline, which includes tokenization, stemming,... |
|
Experimental |
| 37 |
udityamerit/Text-Processing-Package-For-Natural-Language-Processing
This project is a comprehensive collection of NLP techniques, practical... |
|
Experimental |
| 38 |
mahirmsb25/Text-Preprocessing-Pipeline
A Python-based NLP preprocessing pipeline using NLTK and Pandas to clean and... |
|
Experimental |
| 39 |
nluninja/nlp_crash_course_with_spacy
A Natural Language Processing crash course with SpaCy 2.6 and NLTK 3.6.2,... |
|
Experimental |
| 40 |
Varsh008/text_preprocessor_toolkit
Configurable Text Preprocessing Toolkit in Python using spaCy |
|
Experimental |
| 41 |
alanindra/baca-juga-cleaner
Program to clean news text by filtering out irrelevant syntactic... |
|
Experimental |
| 42 |
dodevca/tweet-preprocessor
Lightweight, modular, and extensible Python library for preprocessing... |
|
Experimental |
| 43 |
tnathu-ai/NLP-Job-Ad
Pre-process natural language text data to generate effective feature... |
|
Experimental |
| 44 |
michellepellon/tidyname
Intelligent company name cleaning and normalization for Python. Entity... |
|
Experimental |
| 45 |
PawarMukesh/NLP-Text-PreProcessing
This file is contain techniques used in pre-process the text data |
|
Experimental |