JonathanRaiman/wikipedia_ner

:book: Labeled examples from wiki dumps in Python

/ 100

Emerging

This tool helps data scientists and NLP researchers generate labeled examples for named entity recognition (NER) tasks. It takes Wikipedia dumps as input and extracts entities like people, organizations, and locations, providing a rich dataset for training machine learning models. The output is a collection of articles with identified and categorized named entities.

No commits in the last 6 months. Available on PyPI.

Use this if you need to create a large, diverse dataset of text with named entities labeled for training or evaluating your NER models.

Not ideal if you're looking for a pre-trained NER model or if your specific domain entities are not well-represented in Wikipedia.

natural-language-processing data-labeling machine-learning-datasets information-extraction text-analytics

No License Stale 6m No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 17 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

hellohaptik/chatbot_ner

chatbot_ner: Named Entity Recognition for chatbots.

openeventdata/mordecai

Full text geoparsing as a Python library

Rostlab/nalaf

NLP framework in python for entity recognition and relationship extraction

mpuig/spacy-lookup

Named Entity Recognition based on dictionaries

NorskRegnesentral/skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

Explore NLP Tools

All categories Trending NLP directory Insights