izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator
You can create datasets from Wikia/Wikipedia that can be used for entity recognition and Entity Linking. Dumps for ja-wiki and VTuber-wiki are available!
This tool helps researchers and natural language processing practitioners create specialized datasets for training AI models. It takes raw content from Wikipedia or Fandom (Wikia) pages and transforms it into structured data, indicating mentions of entities like people, organizations, or concepts, and linking them to their definitions. This is ideal for those building AI systems that need to understand and extract specific information from text.
No commits in the last 6 months.
Use this if you need custom, domain-specific datasets from Wikipedia or Fandom wikis to train models for tasks like naming entities or linking them to knowledge bases.
Not ideal if you're looking for pre-trained models or a tool to perform entity recognition and linking directly, rather than create training data for them.
Stars
17
Forks
2
Language
Python
License
—
Category
Last pushed
May 02, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hellohaptik/chatbot_ner
chatbot_ner: Named Entity Recognition for chatbots.
openeventdata/mordecai
Full text geoparsing as a Python library
Rostlab/nalaf
NLP framework in python for entity recognition and relationship extraction
mpuig/spacy-lookup
Named Entity Recognition based on dictionaries
NorskRegnesentral/skweak
skweak: A software toolkit for weak supervision applied to NLP tasks