natasha/corus

Links to Russian corpora + Python functions for loading and parsing

57
/ 100
Established

This tool helps researchers, linguists, and data scientists working with the Russian language easily access and prepare large collections of Russian text. It takes compressed archives of publicly available Russian text datasets (like news articles or social media posts) and provides them as structured records, making it simpler to analyze the content. You would use this if you need to quickly get Russian textual data into a usable format for your research or applications.

310 stars. Available on PyPI.

Use this if you need to efficiently load and parse various Russian text datasets for natural language processing, linguistic analysis, or other data-driven tasks.

Not ideal if you are looking for pre-built models or advanced NLP functionalities, as this tool primarily focuses on data loading and parsing.

Russian-language-processing text-corpus-management linguistic-research data-preparation NLP-datasets
No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 12 / 25

How are scores calculated?

Stars

310

Forks

21

Language

Jupyter Notebook

License

MIT

Last pushed

Feb 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/natasha/corus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.