AI4Bharat/Indic-BERT-v1

Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT

/ 100

Emerging

This project offers a specialized AI model for understanding text in 11 Indian languages and Indian-English, even with fewer computational resources. It takes raw text in these languages and can classify news categories, recognize named entities, or help predict headlines. Language specialists, content analysts, or anyone building language-focused applications for Indian audiences would find this useful.

291 stars. No commits in the last 6 months.

Use this if you need to perform advanced text analysis tasks like classification or entity recognition on content primarily in Indian languages.

Not ideal if your primary focus is on languages outside of the specific set of 12 Indian languages and Indian-English covered here.

Indian-languages natural-language-processing content-analysis text-classification named-entity-recognition

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

291

Forks

Language

Python

License

MIT

Higher-rated alternatives

acl-org/acl-anthology

Data and software for building the ACL Anthology.

anoopkunchukuttan/indic_nlp_library

Resources and tools for Indian language Natural Language Processing

CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

KennethEnevoldsen/scandinavian-embedding-benchmark

A Scandinavian Benchmark for sentence embeddings

Separius/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Explore NLP Tools

All categories Trending NLP directory Insights