mannefedov/ru_kw_eval_datasets
Datasets for evaluation of keyword extraction in Russian
This project offers collections of Russian-language articles and scientific papers, each paired with a set of manually identified keywords. It helps researchers, content analysts, or data scientists working with Russian text to assess how well their automated keyword extraction tools perform. You get raw text content and a verified list of keywords, ready for testing and comparison.
No commits in the last 6 months.
Use this if you need reliable, human-curated keyword sets for Russian texts to benchmark or improve your keyword extraction algorithms.
Not ideal if you need keyword data for languages other than Russian or require a dataset focused on spoken language or highly specialized technical jargon outside of news and academic articles.
Stars
31
Forks
2
Language
—
License
—
Category
Last pushed
Sep 23, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/mannefedov/ru_kw_eval_datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...