natasha/corus
Links to Russian corpora + Python functions for loading and parsing
This tool helps researchers, linguists, and data scientists working with the Russian language easily access and prepare large collections of Russian text. It takes compressed archives of publicly available Russian text datasets (like news articles or social media posts) and provides them as structured records, making it simpler to analyze the content. You would use this if you need to quickly get Russian textual data into a usable format for your research or applications.
310 stars. Available on PyPI.
Use this if you need to efficiently load and parse various Russian text datasets for natural language processing, linguistic analysis, or other data-driven tasks.
Not ideal if you are looking for pre-built models or advanced NLP functionalities, as this tool primarily focuses on data loading and parsing.
Stars
310
Forks
21
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/natasha/corus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...
texttechnologylab/GerParCor
German Parliamentary Corpus (GerParCor)