Brand24-AI/mms_benchmark
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
This project provides a comprehensive collection of sentiment analysis datasets spanning 27 languages and multiple domains like product reviews, enabling you to train or fine-tune models that understand emotional tone in text. It takes raw text data as input and produces categorized sentiment (positive, neutral, negative) for various cultural contexts. This is ideal for data scientists, machine learning engineers, and researchers working on global applications that need to interpret customer feedback or social media mentions across different languages.
No commits in the last 6 months.
Use this if you need high-quality, pre-curated, multilingual sentiment datasets to build or improve your AI models, especially for nuanced, culture-dependent language tasks.
Not ideal if you're looking for a ready-to-use, out-of-the-box sentiment analysis tool rather than a dataset for model training.
Stars
16
Forks
—
Language
Jupyter Notebook
License
—
Category
Last pushed
Nov 14, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Brand24-AI/mms_benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
RISE-UNIBAS/humanities_data_benchmark
LLM Benchmark Suite for Humanities Data
ma-compbio/DNALONGBENCH
A benchmark suite of five genomics tasks for evaluating DNA foundation models on long-range dependencies.
wgyhhhh/EASE
About Official repository for "Towards Real-Time Fake News Detection under Evidence Scarcity"
TreeAI-Lab/NumericBench
A comprehensive benchmark to evaluate and improve the fundamental numerical reasoning abilities...