Ankur3107/nlp_preprocessing

Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc

/ 100

Emerging

This tool helps data scientists and NLP practitioners prepare raw text for analysis. It takes unstructured text data, cleans it by removing noise and standardizing formats, and then structures it into datasets ready for machine learning models. The output is cleaned text, tokenized sequences, and processed datasets suitable for training.

No commits in the last 6 months.

Use this if you are a data scientist or NLP engineer needing to clean and prepare text data before feeding it into machine learning algorithms or for further linguistic analysis.

Not ideal if you are looking for a complete end-to-end machine learning solution or require advanced, domain-specific NLP models out-of-the-box.

text-analysis data-preparation natural-language-processing machine-learning-engineering data-cleaning

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

—

Higher-rated alternatives

chartbeat-labs/textacy

NLP, before and after spaCy

nltk/nltk_data

NLTK Data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

prasanthg3/cleantext

An open-source package for python to clean raw text data

Explore NLP Tools

All categories Trending NLP directory Insights