Koziev/NLP_Datasets

My NLP datasets for Russian language

45
/ 100
Emerging

This project provides pre-collected and pre-processed Russian language datasets, primarily for developing conversational AI. It offers large collections of dialogues from various sources like imageboards, movie subtitles, and literature, along with paraphrased sentences and short sentence patterns. These datasets are ideal for developers, researchers, or data scientists working on Russian natural language processing models, especially for building chatbots or dialogue systems.

386 stars. No commits in the last 6 months.

Use this if you need extensive, ready-to-use Russian text data for training or evaluating conversational AI, text generation, or natural language understanding models.

Not ideal if you require datasets in languages other than Russian, or if your NLP task is highly specialized and requires domain-specific data not covered here.

conversational-ai chatbot-development natural-language-processing dialogue-systems russian-language
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

386

Forks

55

Language

C#

License

CC0-1.0

Last pushed

Feb 18, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Koziev/NLP_Datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.