EuropeanaNewspapers/ner-corpora
Named Entity Recognition data for Europeana Newspapers
This project provides pre-processed text data from historical European newspapers, making it easier to identify key entities like people, places, and organizations. It takes raw, digitized newspaper content and converts it into a format that highlights these entities. Researchers, historians, and data scientists studying historical documents would use this.
173 stars. No commits in the last 6 months.
Use this if you need pre-annotated historical newspaper text to train or evaluate systems for automatically extracting information about people, locations, and organizations.
Not ideal if you require perfect 'gold standard' quality data for evaluation, as it contains OCR errors and has undergone processing that might affect mapping to original articles.
Stars
173
Forks
31
Language
—
License
—
Category
Last pushed
Apr 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/EuropeanaNewspapers/ner-corpora"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MantisAI/nervaluate
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
dice-group/gerbil
GERBIL - General Entity annotatoR Benchmark
bltlab/seqscore
SeqScore: Scoring for named entity recognition and other sequence labeling tasks
syuoni/eznlp
Easy Natural Language Processing
LHNCBC/metamaplite
A near real-time named-entity recognizer