JonathanRaiman/wikipedia_ner

:book: Labeled examples from wiki dumps in Python

36
/ 100
Emerging

This tool helps data scientists and NLP researchers generate labeled examples for named entity recognition (NER) tasks. It takes Wikipedia dumps as input and extracts entities like people, organizations, and locations, providing a rich dataset for training machine learning models. The output is a collection of articles with identified and categorized named entities.

No commits in the last 6 months. Available on PyPI.

Use this if you need to create a large, diverse dataset of text with named entities labeled for training or evaluating your NER models.

Not ideal if you're looking for a pre-trained NER model or if your specific domain entities are not well-represented in Wikipedia.

natural-language-processing data-labeling machine-learning-datasets information-extraction text-analytics
No License Stale 6m No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 17 / 25
Community 11 / 25

How are scores calculated?

Stars

67

Forks

7

Language

Jupyter Notebook

License

Last pushed

Aug 08, 2016

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JonathanRaiman/wikipedia_ner"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.