natasha/corus

Links to Russian corpora + Python functions for loading and parsing

/ 100

Established

This tool helps researchers, linguists, and data scientists working with the Russian language easily access and prepare large collections of Russian text. It takes compressed archives of publicly available Russian text datasets (like news articles or social media posts) and provides them as structured records, making it simpler to analyze the content. You would use this if you need to quickly get Russian textual data into a usable format for your research or applications.

310 stars. Available on PyPI.

Use this if you need to efficiently load and parse various Russian text datasets for natural language processing, linguistic analysis, or other data-driven tasks.

Not ideal if you are looking for pre-built models or advanced NLP functionalities, as this tool primarily focuses on data loading and parsing.

Russian-language-processing text-corpus-management linguistic-research data-preparation NLP-datasets

No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 12 / 25

How are scores calculated?

Stars

310

Forks

Language

Jupyter Notebook

License

MIT

Related tools

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

darija-open-dataset/dataset

darija <-> english dataset

omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...

texttechnologylab/GerParCor

German Parliamentary Corpus (GerParCor)

Explore NLP Tools

All categories Trending NLP directory Insights