ImadSaddik/BoDmaghDataset

BoDmagh dataset is a Supervised Fine-Tuning (SFT) dataset for the Darija language

34
/ 100
Emerging

This project provides a high-quality collection of human-like conversations in Darija, a Moroccan dialect of Arabic. It takes raw Darija conversations and enriches them with details like conversation turns, token counts, and topics, producing a structured dataset. It is primarily for researchers and developers building or improving conversational AI systems for the Darija language.

No commits in the last 6 months.

Use this if you are building or fine-tuning a chatbot or AI assistant that needs to understand and respond in natural, contextually appropriate Darija.

Not ideal if you need a dataset for a language other than Darija, or if you are looking for general text data rather than structured conversational exchanges.

conversational-ai darija-language chatbot-development natural-language-processing ai-model-training
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 18 / 25

How are scores calculated?

Stars

20

Forks

16

Language

Jupyter Notebook

License

Last pushed

May 04, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ImadSaddik/BoDmaghDataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.