izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator

You can create datasets from Wikia/Wikipedia that can be used for entity recognition and Entity Linking. Dumps for ja-wiki and VTuber-wiki are available!

/ 100

Emerging

This tool helps researchers and natural language processing practitioners create specialized datasets for training AI models. It takes raw content from Wikipedia or Fandom (Wikia) pages and transforms it into structured data, indicating mentions of entities like people, organizations, or concepts, and linking them to their definitions. This is ideal for those building AI systems that need to understand and extract specific information from text.

No commits in the last 6 months.

Use this if you need custom, domain-specific datasets from Wikipedia or Fandom wikis to train models for tasks like naming entities or linking them to knowledge bases.

Not ideal if you're looking for pre-trained models or a tool to perform entity recognition and linking directly, rather than create training data for them.

natural-language-processing ai-training-data information-extraction knowledge-graph-building text-annotation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

hellohaptik/chatbot_ner

chatbot_ner: Named Entity Recognition for chatbots.

openeventdata/mordecai

Full text geoparsing as a Python library

Rostlab/nalaf

NLP framework in python for entity recognition and relationship extraction

mpuig/spacy-lookup

Named Entity Recognition based on dictionaries

NorskRegnesentral/skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

Explore NLP Tools

All categories Trending NLP directory Insights