CUNY-CL/wikipron

Massively multilingual pronunciation mining

68
/ 100
Established

WikiPron helps linguists and researchers easily collect pronunciation data from Wiktionary. It takes a language's ISO code and desired dialects, then outputs a list of words with their corresponding phonetic transcriptions in the International Phonetic Alphabet (IPA). This tool is for anyone building language resources or conducting linguistic analysis.

363 stars. Available on PyPI.

Use this if you need to gather extensive word-pronunciation pairs for many languages to support research, build language learning tools, or develop speech technologies.

Not ideal if you need human-verified pronunciation data or are working with languages not extensively covered in Wiktionary.

linguistics phonetics language-resource-creation speech-technology computational-linguistics
Maintenance 10 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 23 / 25

How are scores calculated?

Stars

363

Forks

77

Language

Python

License

Apache-2.0

Last pushed

Mar 03, 2026

Commits (30d)

0

Dependencies

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CUNY-CL/wikipron"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.