aparnadutta/code-mixed-lid
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
This tool helps social media analysts or content moderators working with multilingual communities to accurately identify languages within mixed-language posts. You input social media text that combines both Bengali (Bangla) and English, and it outputs a label for each word indicating whether it's Bengali or English. This is ideal for professionals needing to understand language usage in code-mixed online conversations.
No commits in the last 6 months.
Use this if you need to precisely determine which words in a Bengali-English social media post are Bengali and which are English.
Not ideal if your text data is in a different language pair, or if you only need to identify a single dominant language for an entire post.
Stars
10
Forks
1
Language
Python
License
—
Category
Last pushed
Aug 13, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aparnadutta/code-mixed-lid"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
indix/whatthelang
Lightning Fast Language Prediction 🚀
nitotm/efficient-language-detector-js
Fast and accurate natural language detection. Detector written in Javascript. Nito-ELD, ELD.
pemistahl/lingua-py
The most accurate natural language detection library for Python, suitable for short text and...
mbanon/fastspell
Targetted language identifier, based on FastText and Hunspell.
nitotm/efficient-language-detector
Fast and accurate natural language detection. Detector written in PHP. Nito-ELD, ELD.