sinaahmadi/KurdishTokenization

Tokenization resources for Kurdish (Sorani & Kurmanji dialects)

21
/ 100
Experimental

This project helps process written Kurdish text by breaking sentences into individual words and meaningful units for both Sorani and Kurmanji dialects. It takes raw Kurdish sentences as input and outputs tokenized text, making it easier for researchers and language technology developers to analyze and build applications. Anyone working with Kurdish language data, such as computational linguists or language educators, would find this useful.

No commits in the last 6 months.

Use this if you need to accurately segment Kurdish sentences (Sorani or Kurmanji) into individual words or multi-word expressions for linguistic analysis or building language tools.

Not ideal if you are looking for a complete natural language processing toolkit beyond just tokenization.

Kurdish-language-processing linguistic-analysis text-preparation language-education computational-linguistics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Lex

License

Last pushed

Jun 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sinaahmadi/KurdishTokenization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.