U4RASD/dalla-model-training
Dalla training recipe using Huggingface SFT trainer
This project helps developers fine-tune large language models, particularly with Arabic text. It takes raw text data, cleans it, and prepares it for training, then adapts an existing model's vocabulary to better understand your specific content. The output is a specialized language model ready for tasks like continued pre-training or supervised fine-tuning. This is for machine learning engineers and researchers working on custom Arabic language models.
Use this if you need to train or fine-tune an existing large language model specifically for Arabic language tasks and want to integrate custom data and vocabulary efficiently.
Not ideal if you are looking for a pre-trained, ready-to-use Arabic language model without needing custom fine-tuning or specialized tokenizer adjustments.
Stars
8
Forks
—
Language
Python
License
—
Category
Last pushed
Dec 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/U4RASD/dalla-model-training"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started...
arcee-ai/mergekit
Tools for merging pretrained large language models.
changyeyu/LLM-RL-Visualized
๐100+ ๅๅ LLM / RL ๅ็ๅพ๐๏ผใๅคงๆจกๅ็ฎๆณใไฝ่ ๅทจ็ฎ๏ผ๐ฅ๏ผ100+ LLM/RL Algorithm Maps ๏ผ
mindspore-lab/step_into_llm
MindSpore online courses: Step into LLM