Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
This tool helps data scientists and NLP practitioners prepare raw text for analysis. It takes unstructured text data, cleans it by removing noise and standardizing formats, and then structures it into datasets ready for machine learning models. The output is cleaned text, tokenized sequences, and processed datasets suitable for training.
No commits in the last 6 months.
Use this if you are a data scientist or NLP engineer needing to clean and prepare text data before feeding it into machine learning algorithms or for further linguistic analysis.
Not ideal if you are looking for a complete end-to-end machine learning solution or require advanced, domain-specific NLP models out-of-the-box.
Stars
18
Forks
7
Language
JavaScript
License
—
Category
Last pushed
Aug 16, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Ankur3107/nlp_preprocessing"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
prasanthg3/cleantext
An open-source package for python to clean raw text data