AlexKly/Detailed-NER-Dataset-RU
Labeled Russian text token-by-token for training models for NER task based samples got from parsing different resources and generated by ChatGPT.
This dataset provides detailed, labeled Russian text, helping developers create or enhance Natural Language Processing (NLP) models. It takes raw Russian text as input and produces text annotated with specific entity types like locations (cities, countries, streets) and personal names (first, middle, last). NLP engineers and machine learning practitioners building Russian language applications would find this useful.
No commits in the last 6 months.
Use this if you need a high-quality, fine-grained dataset to train or improve models for extracting detailed entities from Russian text.
Not ideal if you only need general entity recognition (e.g., just 'PERSON' or 'LOCATION' without sub-types) or if you are working with languages other than Russian.
Stars
10
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 20, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/AlexKly/Detailed-NER-Dataset-RU"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...