possible-worlds-research/wikinlp

A package to download and preprocess a Wikipedia dump, in any language.

23
/ 100
Experimental

Need to analyze the content of Wikipedia for your research? This tool automatically downloads and converts Wikipedia articles, in any language, into a plain text format. It handles finding the latest data, cleaning it up, and preparing it for text analysis, perfect for academics or data scientists studying language, culture, or societal trends.

No commits in the last 6 months.

Use this if you need a pre-processed, language-specific Wikipedia corpus for your research, without dealing with the complexities of data acquisition and cleaning.

Not ideal if you only need small snippets of Wikipedia data that can be manually copied or if you prefer to build your own custom data pipeline from scratch.

linguistics research text analytics cultural studies data science natural language processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

AGPL-3.0

Last pushed

Sep 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/possible-worlds-research/wikinlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.