ImadSaddik/BoDmaghDataset
BoDmagh dataset is a Supervised Fine-Tuning (SFT) dataset for the Darija language
This project provides a high-quality collection of human-like conversations in Darija, a Moroccan dialect of Arabic. It takes raw Darija conversations and enriches them with details like conversation turns, token counts, and topics, producing a structured dataset. It is primarily for researchers and developers building or improving conversational AI systems for the Darija language.
No commits in the last 6 months.
Use this if you are building or fine-tuning a chatbot or AI assistant that needs to understand and respond in natural, contextually appropriate Darija.
Not ideal if you need a dataset for a language other than Darija, or if you are looking for general text data rather than structured conversational exchanges.
Stars
20
Forks
16
Language
Jupyter Notebook
License
—
Category
Last pushed
May 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ImadSaddik/BoDmaghDataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EQTPartners/PTEC
Code repository corresponding to the paper "Prompt Tuned Embedding Classification for...
chaoswork/sft_datasets
开源SFT数据集整理,随时补充
angeluriot/French_instruct
A dataset of instructions and answers in natural language for machine learning.
andrewzamai/SLIMER
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER