sinaahmadi/KurdishTokenization

Tokenization resources for Kurdish (Sorani & Kurmanji dialects)

/ 100

Experimental

This project helps process written Kurdish text by breaking sentences into individual words and meaningful units for both Sorani and Kurmanji dialects. It takes raw Kurdish sentences as input and outputs tokenized text, making it easier for researchers and language technology developers to analyze and build applications. Anyone working with Kurdish language data, such as computational linguists or language educators, would find this useful.

No commits in the last 6 months.

Use this if you need to accurately segment Kurdish sentences (Sorani or Kurmanji) into individual words or multi-word expressions for linguistic analysis or building language tools.

Not ideal if you are looking for a complete natural language processing toolkit beyond just tokenization.

Kurdish-language-processing linguistic-analysis text-preparation language-education computational-linguistics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Lex

License

—

Higher-rated alternatives

nert-nlp/streusle

STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)

bretttolbert/verbecc

Verbe Complete Conjugator (verbecc) supports Catalan, Spanish, French, Italian, Portuguese and...

natasha/yargy

Rule-based facts extraction for Russian language

google-research/turkish-morphology

A two-level morphological analyzer for Turkish.

bjascob/LemmInflect

A python module for English lemmatization and inflection.

Explore NLP Tools

All categories Trending NLP directory Insights