possible-worlds-research/wikinlp
A package to download and preprocess a Wikipedia dump, in any language.
Need to analyze the content of Wikipedia for your research? This tool automatically downloads and converts Wikipedia articles, in any language, into a plain text format. It handles finding the latest data, cleaning it up, and preparing it for text analysis, perfect for academics or data scientists studying language, culture, or societal trends.
No commits in the last 6 months.
Use this if you need a pre-processed, language-specific Wikipedia corpus for your research, without dealing with the complexities of data acquisition and cleaning.
Not ideal if you only need small snippets of Wikipedia data that can be manually copied or if you prefer to build your own custom data pipeline from scratch.
Stars
9
Forks
—
Language
Python
License
AGPL-3.0
Category
Last pushed
Sep 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/possible-worlds-research/wikinlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
philenius/ngx-annotate-text
This Angular component library is perfect for tasks like visualizing named entity recognition,...
davidjurgens/potato
potato: the portable annotation tool
jiesutd/YEDDA
YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper...
synyi/poplar
A web-based annotation tool for natural language processing (NLP)
webanno/webanno
🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The...