KurdishBLARK/InterdialectCorpus
A parallel corpus of Sorani, Kurmanji and English
This collection of text helps language professionals and researchers by providing carefully aligned translations between Sorani, Kurmanji, and English. It takes news articles in these languages and presents them as parallel texts, so you can easily see how sentences translate across dialects and languages. Translators, linguists, and computational linguists focused on Kurdish will find this useful.
No commits in the last 6 months.
Use this if you need accurate, manually-aligned text pairs for translation work, linguistic analysis, or developing language technologies for Kurdish.
Not ideal if you require a corpus for languages other than Sorani, Kurmanji, or English, or if you need an unaligned, general text collection.
Stars
15
Forks
3
Language
—
License
—
Category
Last pushed
Oct 06, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/KurdishBLARK/InterdialectCorpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...