nerel-ds/NEREL-BIO
NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities
This project provides a specialized collection of biomedical research abstracts from PubMed, available in both Russian and English, with key terms and concepts meticulously tagged. It helps researchers, clinical data analysts, and anyone working with scientific literature to quickly identify and extract specific information like medical procedures, diseases, chemicals, and anatomical references, improving the efficiency of data extraction from complex texts. The corpus serves as input for building tools that can then output structured data from unstructured text.
Use this if you need high-quality, pre-annotated biomedical text data to train or evaluate systems that automatically identify entities within scientific articles, especially for nested entities (entities within other entities).
Not ideal if you are looking for a tool to directly perform text analysis on your own documents without needing to develop or train a model.
Stars
30
Forks
2
Language
Python
License
—
Category
Last pushed
Feb 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nerel-ds/NEREL-BIO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MantisAI/nervaluate
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
dice-group/gerbil
GERBIL - General Entity annotatoR Benchmark
bltlab/seqscore
SeqScore: Scoring for named entity recognition and other sequence labeling tasks
syuoni/eznlp
Easy Natural Language Processing
LHNCBC/metamaplite
A near real-time named-entity recognizer