csebuetnlp/banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
This project offers pre-trained language models specifically for the Bengali (Bangla) language, enabling various text analysis tasks. It takes raw Bengali text as input and helps classify documents, detect sentiment, identify named entities like people or places, and answer questions. It's ideal for computational linguists, researchers, or NLP practitioners working with Bengali language data.
248 stars. No commits in the last 6 months.
Use this if you need to develop applications that understand and process the Bengali language, such as sentiment analysis, information extraction, or question-answering systems.
Not ideal if your project involves languages other than Bengali or requires a simple, off-the-shelf solution without any fine-tuning or development work.
Stars
248
Forks
35
Language
Python
License
—
Category
Last pushed
Jan 24, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/csebuetnlp/banglabert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
acl-org/acl-anthology
Data and software for building the ACL Anthology.
anoopkunchukuttan/indic_nlp_library
Resources and tools for Indian language Natural Language Processing
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Separius/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models