Digital-Pushkin-Lab/RuAdapt

A Parallel Russian-Simple Russian Dataset

/ 100

Experimental

This is a collection of Russian texts paired with their simplified versions, designed for people learning Russian as a foreign language. You get original literature, encyclopedic entries, and fairytales, alongside texts adapted for specific CEFR language proficiency levels. It's used by educators, linguists, and textbook creators to understand or create simplified Russian materials.

No commits in the last 6 months.

Use this if you need aligned pairs of complex and simplified Russian text to study language simplification or create educational resources for learners.

Not ideal if you are looking for an active tool to simplify texts automatically rather than a dataset for research or development.

Russian-language-teaching language-learning-materials linguistics-research textbook-development CEFR-alignment

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

natasha/corus

Links to Russian corpora + Python functions for loading and parsing

SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

darija-open-dataset/dataset

darija <-> english dataset

omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...

Explore NLP Tools

All categories Trending NLP directory Insights