Brand24-AI/mms_benchmark

The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.

22
/ 100
Experimental

This project provides a comprehensive collection of sentiment analysis datasets spanning 27 languages and multiple domains like product reviews, enabling you to train or fine-tune models that understand emotional tone in text. It takes raw text data as input and produces categorized sentiment (positive, neutral, negative) for various cultural contexts. This is ideal for data scientists, machine learning engineers, and researchers working on global applications that need to interpret customer feedback or social media mentions across different languages.

No commits in the last 6 months.

Use this if you need high-quality, pre-curated, multilingual sentiment datasets to build or improve your AI models, especially for nuanced, culture-dependent language tasks.

Not ideal if you're looking for a ready-to-use, out-of-the-box sentiment analysis tool rather than a dataset for model training.

sentiment-analysis multilingual-text natural-language-processing social-listening customer-feedback-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

16

Forks

Language

Jupyter Notebook

License

Last pushed

Nov 14, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Brand24-AI/mms_benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.