aditeyabaral/calbert
CalBERT - Code-mixed Adaptive Language representations using BERT, published at AAAI-MAKE 2022
This project helps developers and researchers working with code-mixed languages, like Hinglish, to create more accurate natural language processing (NLP) models. It takes in sentences from two related languages (e.g., English and Hindi) and outputs dense vector representations (embeddings) for words, sentences, or paragraphs, which can then be used for tasks like sentiment analysis or semantic search. This is for machine learning engineers, data scientists, and computational linguists.
No commits in the last 6 months. Available on PyPI.
Use this if you need to build or improve NLP models that understand and process text containing a blend of two languages.
Not ideal if your NLP tasks exclusively involve a single language or if you are not comfortable working with machine learning model training and development.
Stars
13
Forks
3
Language
Python
License
MIT
Category
Last pushed
Dec 18, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aditeyabaral/calbert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DerwenAI/pytextrank
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
Tiiiger/bert_score
BERT score for text generation
BrikerMan/Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...
yohasebe/wp2txt
A command-line tool to extract plain text from Wikipedia dumps with category and section filtering