yutkin/Lenta.Ru-News-Dataset
Corpus of Russian news articles collected from Lenta.Ru
This dataset provides a large collection of over 800,000 Russian news articles from Lenta.Ru, covering a twenty-year period from 1999 to 2019. It's ideal for researchers, journalists, or analysts who need extensive historical Russian news content to understand trends, track events, or train language models.
145 stars. No commits in the last 6 months.
Use this if you need a comprehensive, historical archive of Russian news content for research or analysis.
Not ideal if you need real-time news updates or news from sources other than Lenta.Ru.
Stars
145
Forks
24
Language
Python
License
—
Category
Last pushed
Nov 19, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/yutkin/Lenta.Ru-News-Dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...