possible-worlds-research/wikinlp

A package to download and preprocess a Wikipedia dump, in any language.

/ 100

Experimental

Need to analyze the content of Wikipedia for your research? This tool automatically downloads and converts Wikipedia articles, in any language, into a plain text format. It handles finding the latest data, cleaning it up, and preparing it for text analysis, perfect for academics or data scientists studying language, culture, or societal trends.

No commits in the last 6 months.

Use this if you need a pre-processed, language-specific Wikipedia corpus for your research, without dealing with the complexities of data acquisition and cleaning.

Not ideal if you only need small snippets of Wikipedia data that can be manually copied or if you prefer to build your own custom data pipeline from scratch.

linguistics research text analytics cultural studies data science natural language processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

AGPL-3.0

Higher-rated alternatives

philenius/ngx-annotate-text

This Angular component library is perfect for tasks like visualizing named entity recognition,...

davidjurgens/potato

potato: the portable annotation tool

jiesutd/YEDDA

YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper...

synyi/poplar

A web-based annotation tool for natural language processing (NLP)

webanno/webanno

🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The...

Explore NLP Tools

All categories Trending NLP directory Insights