AnwarCS/Sudanese-Arabic-LLM
Building a Sudanese Arabic dataset and fine-tuning LLMs to improve representation of this dialect.
This project helps improve how large language models understand and generate Sudanese Arabic. It takes raw text from various sources like social media and oral stories, processes it, and then uses it to train AI models. The output is an AI that better recognizes and produces Sudanese Arabic. Language researchers, AI developers, and cultural preservationists focused on Sudanese Arabic would find this useful.
No commits in the last 6 months.
Use this if you need AI models that accurately process and generate text specifically in the Sudanese Arabic dialect.
Not ideal if your focus is on Modern Standard Arabic or other Arabic dialects, as this project is highly specialized.
Stars
22
Forks
17
Language
Python
License
MIT
Category
Last pushed
Jun 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/AnwarCS/Sudanese-Arabic-LLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
CAMeL-Lab/camel_tools
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York...
PetrKorab/Arabica
Python package for text mining of time-series data
markuskiller/textblob-de
German language support for TextBlob.
MagedSaeed/farasapy
A Python implementation of Farasa toolkit
adhaamehab/textblob-ar
Arabic support for textblob