fido-ai/ua-datasets

A collection of datasets for Ukrainian language

/ 100

Emerging

This collection provides ready-to-use Ukrainian language datasets for natural language processing tasks. It offers structured text data for question answering, news categorization, and identifying parts of speech. Researchers, linguists, or educators working with Ukrainian text can use this to quickly access and process diverse textual content.

Available on PyPI.

Use this if you need consistent, readily available Ukrainian text data to develop or evaluate language models and applications.

Not ideal if you require datasets for languages other than Ukrainian or specialized data beyond the scope of question answering, news classification, or part-of-speech tagging.

Ukrainian-language-research text-analysis linguistics content-categorization question-answering

No Dependents

Maintenance 6 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

natasha/corus

Links to Russian corpora + Python functions for loading and parsing

darija-open-dataset/dataset

darija <-> english dataset

omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...

SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

Explore NLP Tools

All categories Trending NLP directory Insights