mideind/GreynirCorpus
A large treebank of parsed Icelandic text
This project provides a massive collection of Icelandic text that has been analyzed for its grammatical structure, presented as 'treebanks'. You can input raw Icelandic sentences and get back their full grammatical breakdown, showing how words relate to each other. It's designed for linguists, language technology researchers, or anyone building tools that need to understand Icelandic grammar deeply.
No commits in the last 6 months.
Use this if you need structured grammatical data for Icelandic text, such as for training a natural language processing model or conducting linguistic research.
Not ideal if you simply need a large corpus of raw Icelandic text without grammatical analysis, or if your focus is on a language other than Icelandic.
Stars
8
Forks
—
Language
—
License
CC-BY-4.0
Category
Last pushed
Jun 30, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/mideind/GreynirCorpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...