ilinguistics/earthLings
Corpus-based language and dialect mapping
This tool helps researchers and linguists visualize how languages and dialects are used globally across different regions and timeframes. It takes massive text datasets from social media and web crawls, processes them to identify languages and locations, and then presents interactive maps that show linguistic distribution. Anyone studying language variation, socio-linguistics, or global communication patterns would find this valuable.
Use this if you need to explore and visualize the geographical spread and historical changes of languages and dialects based on large-scale textual data.
Not ideal if you're looking for a tool to perform detailed linguistic analysis on small, specific texts rather than broad geographical mapping.
Stars
7
Forks
—
Language
—
License
GPL-2.0
Category
Last pushed
Feb 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ilinguistics/earthLings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...