kateryna-bobrovnyk/ukr-twi-corpus
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
This project provides a large collection of Ukrainian Twitter posts, along with tools to get even more. It's designed for researchers or analysts who need to study social media trends, language use, or public sentiment within Ukrainian online discussions. You'll get a pre-built dataset of Ukrainian tweets and can use provided scripts to expand and refine your own custom collections.
No commits in the last 6 months.
Use this if you are a linguist, social scientist, or data analyst studying Ukrainian language, social media, or public opinion.
Not ideal if you need real-time data or require a corpus for languages other than Ukrainian.
Stars
15
Forks
3
Language
Jupyter Notebook
License
—
Category
Last pushed
Jul 04, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/kateryna-bobrovnyk/ukr-twi-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.