cisnlp/Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
This project offers an advanced language model that understands and processes text in over 500 languages, significantly more than previous models. It takes text in many languages and can fill in missing words, understand sentence meaning, or extract features from the text. Researchers, computational linguists, and developers working on global natural language processing applications will find this useful for projects needing broad multilingual support.
106 stars. No commits in the last 6 months.
Use this if you need a pre-trained language model or a vast collection of text data for tasks like masked language modeling, sentence retrieval, text classification, or named entity recognition across hundreds of languages, especially for less common or 'tail' languages.
Not ideal if your project is strictly limited to a few widely spoken languages already well-covered by existing, more specialized models.
Stars
106
Forks
4
Language
Python
License
—
Category
Last pushed
Apr 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cisnlp/Glot500"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
n-waves/multifit
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model...
yxuansu/SimCTG
[NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation
alibaba-edu/simple-effective-text-matching
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of...