ksnugroho/basic-text-preprocessing
Basic text preprocessing for Bahasa with Python.
This tool helps data analysts and researchers prepare Indonesian text data for various analytical tasks like sentiment analysis or topic modeling. It takes raw, unstructured Indonesian text as input and transforms it into a clean, structured format, making it ready for advanced machine learning models. It's designed for individuals working with text-based insights in Bahasa Indonesia.
No commits in the last 6 months.
Use this if you need to clean and structure Indonesian text data before performing advanced analysis or building predictive models.
Not ideal if you are working with languages other than Indonesian, or if you need highly advanced, domain-specific text normalization techniques.
Stars
40
Forks
10
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Sep 22, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ksnugroho/basic-text-preprocessing"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chartbeat-labs/textacy
NLP, before and after spaCy
nltk/nltk_data
NLTK Data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
prasanthg3/cleantext
An open-source package for python to clean raw text data