Arabic Text Normalization NLP Tools

Tools for Arabic-specific text processing including diacritization (vowelization), dialect identification/classification, and transliteration between Arabic scripts and romanization systems. Does NOT include general morphological analysis, stemming, or non-Arabic language processing.

There are 19 arabic text normalization tools tracked. 1 score above 50 (established tier). The highest-rated is linuxscout/mishkal at 51/100 with 307 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=arabic-text-normalization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 linuxscout/mishkal

Mishkal is an arabic text vocalization software

51
Established
2 hb20007/greek-dialect-classifier

Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek

45
Emerging
3 AliOsm/arabic-text-diacritization

Benchmark Arabic text diacritization dataset

44
Emerging
4 mush42/libtashkeel

Add Arabic diacritics (tashkeel/harakat) using Rust/Python/C++/WASM and NLP models

43
Emerging
5 AliOsm/shakkelha

Neural Arabic text diacritization

41
Emerging
6 BasmaElhoseny01/Tashkeel

A system that takes a sentence and produces the same sentence after...

31
Emerging
7 saobou/DSAraby

We've created a library named "DSAraby" that aims to transliterate text...

30
Emerging
8 AbdelrahmanHamdyy/Arabic-Text-Diacritization

Course Project for Natural Language Processing

30
Emerging
9 WoLFi22/DialectClassificationPipeline

This repository provides a pipeline for dialect classification using deep...

24
Experimental
10 textgain/redcrow

Arabic Dialect Identifier

23
Experimental
11 norhanreda/Arabic-Text-Diacritization

Diacritics are short vowels with a constant length that are spoken. The same...

23
Experimental
12 Crinmatic/Diacritic-Restoration

Using AI to restore Diacritics on Yoruba language (which is a low resource language)

23
Experimental
13 hazemhosny/ArabicDialectClassification

Arabic Dialect Sentimenal Analysis

23
Experimental
14 nipponjo/arabic_vocalizer

Arabic deep-learning based diacritization models (Shakkala, Shakkelha) in...

21
Experimental
15 adelelwan24/Arabic-Dialect-Classification

Many countries speak Arabic; however, each country has its own dialect, the...

15
Experimental
16 eesanoble/Arabic-Dialect-Classifier

An Arabic Tweet Dialect Classifier

10
Experimental
17 Ahmad-Zaki/Arabic_Dialect_Identification

A machine learning/deep learning approach to classify the dialect of arabic text.

10
Experimental
18 Qfl3x/ArabicDiacritization

A personal project on diacritizing Arabic text.

10
Experimental
19 achrafdotio/arabizi2arabic

convert Darija arabizi to arabic Darija

10
Experimental