AI4Bharat/Indic-BERT-v1
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT
This project offers a specialized AI model for understanding text in 11 Indian languages and Indian-English, even with fewer computational resources. It takes raw text in these languages and can classify news categories, recognize named entities, or help predict headlines. Language specialists, content analysts, or anyone building language-focused applications for Indian audiences would find this useful.
291 stars. No commits in the last 6 months.
Use this if you need to perform advanced text analysis tasks like classification or entity recognition on content primarily in Indian languages.
Not ideal if your primary focus is on languages outside of the specific set of 12 Indian languages and Indian-English covered here.
Stars
291
Forks
39
Language
Python
License
MIT
Category
Last pushed
May 11, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/AI4Bharat/Indic-BERT-v1"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
acl-org/acl-anthology
Data and software for building the ACL Anthology.
anoopkunchukuttan/indic_nlp_library
Resources and tools for Indian language Natural Language Processing
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Separius/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models