ImadSaddik/BoDmaghDataset

BoDmagh dataset is a Supervised Fine-Tuning (SFT) dataset for the Darija language

/ 100

Emerging

This project provides a high-quality collection of human-like conversations in Darija, a Moroccan dialect of Arabic. It takes raw Darija conversations and enriches them with details like conversation turns, token counts, and topics, producing a structured dataset. It is primarily for researchers and developers building or improving conversational AI systems for the Darija language.

No commits in the last 6 months.

Use this if you are building or fine-tuning a chatbot or AI assistant that needs to understand and respond in natural, contextually appropriate Darija.

Not ideal if you need a dataset for a language other than Darija, or if you are looking for general text data rather than structured conversational exchanges.

conversational-ai darija-language chatbot-development natural-language-processing ai-model-training

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

EQTPartners/PTEC

Code repository corresponding to the paper "Prompt Tuned Embedding Classification for...

chaoswork/sft_datasets

开源SFT数据集整理,随时补充

angeluriot/French_instruct

A dataset of instructions and answers in natural language for machine learning.

andrewzamai/SLIMER

Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER

Explore NLP Tools

All categories Trending NLP directory Insights