cisnlp/Glot500

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

32
/ 100
Emerging

This project offers an advanced language model that understands and processes text in over 500 languages, significantly more than previous models. It takes text in many languages and can fill in missing words, understand sentence meaning, or extract features from the text. Researchers, computational linguists, and developers working on global natural language processing applications will find this useful for projects needing broad multilingual support.

106 stars. No commits in the last 6 months.

Use this if you need a pre-trained language model or a vast collection of text data for tasks like masked language modeling, sentence retrieval, text classification, or named entity recognition across hundreds of languages, especially for less common or 'tail' languages.

Not ideal if your project is strictly limited to a few widely spoken languages already well-covered by existing, more specialized models.

multilingual-NLP computational-linguistics global-communication-technologies low-resource-languages text-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

106

Forks

4

Language

Python

License

Last pushed

Apr 20, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cisnlp/Glot500"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.