fido-ai/ua-datasets
A collection of datasets for Ukrainian language
This collection provides ready-to-use Ukrainian language datasets for natural language processing tasks. It offers structured text data for question answering, news categorization, and identifying parts of speech. Researchers, linguists, or educators working with Ukrainian text can use this to quickly access and process diverse textual content.
Available on PyPI.
Use this if you need consistent, readily available Ukrainian text data to develop or evaluate language models and applications.
Not ideal if you require datasets for languages other than Ukrainian or specialized data beyond the scope of question answering, news classification, or part-of-speech tagging.
Stars
56
Forks
2
Language
Python
License
MIT
Category
Last pushed
Oct 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/fido-ai/ua-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.