Koziev/NLP_Datasets

My NLP datasets for Russian language

/ 100

Emerging

This project provides pre-collected and pre-processed Russian language datasets, primarily for developing conversational AI. It offers large collections of dialogues from various sources like imageboards, movie subtitles, and literature, along with paraphrased sentences and short sentence patterns. These datasets are ideal for developers, researchers, or data scientists working on Russian natural language processing models, especially for building chatbots or dialogue systems.

386 stars. No commits in the last 6 months.

Use this if you need extensive, ready-to-use Russian text data for training or evaluating conversational AI, text generation, or natural language understanding models.

Not ideal if you require datasets in languages other than Russian, or if your NLP task is highly specialized and requires domain-specific data not covered here.

conversational-ai chatbot-development natural-language-processing dialogue-systems russian-language

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

386

Forks

Language

License

CC0-1.0

Higher-rated alternatives

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

natasha/corus

Links to Russian corpora + Python functions for loading and parsing

SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

darija-open-dataset/dataset

darija <-> english dataset

omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...

Explore NLP Tools

All categories Trending NLP directory Insights