aditeyabaral/calbert

CalBERT - Code-mixed Adaptive Language representations using BERT, published at AAAI-MAKE 2022

/ 100

Emerging

This project helps developers and researchers working with code-mixed languages, like Hinglish, to create more accurate natural language processing (NLP) models. It takes in sentences from two related languages (e.g., English and Hindi) and outputs dense vector representations (embeddings) for words, sentences, or paragraphs, which can then be used for tasks like sentiment analysis or semantic search. This is for machine learning engineers, data scientists, and computational linguists.

No commits in the last 6 months. Available on PyPI.

Use this if you need to build or improve NLP models that understand and process text containing a blend of two languages.

Not ideal if your NLP tasks exclusively involve a single language or if you are not comfortable working with machine learning model training and development.

code-mixing multilingual-nlp natural-language-processing sentiment-analysis semantic-search

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

DerwenAI/pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

Tiiiger/bert_score

BERT score for text generation

BrikerMan/Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...

asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...

yohasebe/wp2txt

A command-line tool to extract plain text from Wikipedia dumps with category and section filtering

Explore NLP Tools

All categories Trending NLP directory Insights