AsoSoft/AsoSoft-Text-Corpus

AsoSoft Text Corpus is the first large scale text corpus for the Kurdish language.

22
/ 100
Experimental

This provides the first large-scale collection of Kurdish language text, specifically for the Central Kurdish (Sorani) dialect. It takes raw Kurdish text from various sources, cleans and standardizes it through a detailed normalization process, and outputs a massive, organized corpus ready for analysis. Linguists, lexicographers, and natural language processing (NLP) researchers working with the Kurdish language would use this resource.

No commits in the last 6 months.

Use this if you need a pre-processed, extensive dataset of Central Kurdish text for linguistic analysis, dictionary creation, or developing applications that understand or generate Kurdish language.

Not ideal if your project involves a different dialect of Kurdish or requires data for commercial purposes, as this corpus is strictly for non-commercial research.

Kurdish-language-research linguistics lexicography NLP-data speech-processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

27

Forks

2

Language

License

Last pushed

Apr 01, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/AsoSoft/AsoSoft-Text-Corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.