ARBML/tnkeeh
Arabic cleaning, normalization and segmentation library.
This library helps anyone working with Arabic text prepare it for analysis or machine learning. It takes raw, uncleaned Arabic text from various sources like files, social media, or web pages and processes it to remove noise, standardize characters, and segment sentences. Data scientists, computational linguists, or researchers focused on Arabic language processing would use this.
No commits in the last 6 months.
Use this if you need to clean, normalize, or segment Arabic text to improve the performance of your language models or analysis tools.
Not ideal if you are working with non-Arabic languages or primarily need advanced linguistic analysis beyond basic cleaning and segmentation.
Stars
74
Forks
9
Language
Python
License
MIT
Category
Last pushed
Sep 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ARBML/tnkeeh"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
CAMeL-Lab/camel_tools
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York...
PetrKorab/Arabica
Python package for text mining of time-series data
markuskiller/textblob-de
German language support for TextBlob.
MagedSaeed/farasapy
A Python implementation of Farasa toolkit
adhaamehab/textblob-ar
Arabic support for textblob