sinaahmadi/KurdishTokenization
Tokenization resources for Kurdish (Sorani & Kurmanji dialects)
This project helps process written Kurdish text by breaking sentences into individual words and meaningful units for both Sorani and Kurmanji dialects. It takes raw Kurdish sentences as input and outputs tokenized text, making it easier for researchers and language technology developers to analyze and build applications. Anyone working with Kurdish language data, such as computational linguists or language educators, would find this useful.
No commits in the last 6 months.
Use this if you need to accurately segment Kurdish sentences (Sorani or Kurmanji) into individual words or multi-word expressions for linguistic analysis or building language tools.
Not ideal if you are looking for a complete natural language processing toolkit beyond just tokenization.
Stars
9
Forks
—
Language
Lex
License
—
Category
Last pushed
Jun 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sinaahmadi/KurdishTokenization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
nert-nlp/streusle
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
bretttolbert/verbecc
Verbe Complete Conjugator (verbecc) supports Catalan, Spanish, French, Italian, Portuguese and...
natasha/yargy
Rule-based facts extraction for Russian language
google-research/turkish-morphology
A two-level morphological analyzer for Turkish.
bjascob/LemmInflect
A python module for English lemmatization and inflection.