izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator

You can create datasets from Wikia/Wikipedia that can be used for entity recognition and Entity Linking. Dumps for ja-wiki and VTuber-wiki are available!

31
/ 100
Emerging

This tool helps researchers and natural language processing practitioners create specialized datasets for training AI models. It takes raw content from Wikipedia or Fandom (Wikia) pages and transforms it into structured data, indicating mentions of entities like people, organizations, or concepts, and linking them to their definitions. This is ideal for those building AI systems that need to understand and extract specific information from text.

No commits in the last 6 months.

Use this if you need custom, domain-specific datasets from Wikipedia or Fandom wikis to train models for tasks like naming entities or linking them to knowledge bases.

Not ideal if you're looking for pre-trained models or a tool to perform entity recognition and linking directly, rather than create training data for them.

natural-language-processing ai-training-data information-extraction knowledge-graph-building text-annotation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

17

Forks

2

Language

Python

License

Last pushed

May 02, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.