ImadSaddik/DarijaTokenizers

Free to use tokenizers trained on the Darija language.

31
/ 100
Emerging

This project offers ready-to-use text tokenizers specifically designed for the Darija language. It takes raw Darija text as input and converts it into numerical tokens, which are essential for building and training large language models (LLMs). Developers working on natural language processing (NLP) applications for Darija speakers will find these tokenizers valuable.

No commits in the last 6 months.

Use this if you are a developer building an LLM or any NLP application for the Darija language and need an efficient way to convert text into numerical representations.

Not ideal if your primary text data is in Latin letters or languages other than Darija, as its performance may be suboptimal.

Natural Language Processing LLM Development Darija Language Text Preprocessing AI Development
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 18 / 25

How are scores calculated?

Stars

11

Forks

15

Language

Python

License

Last pushed

Mar 26, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ImadSaddik/DarijaTokenizers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.