PhilipMay/stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

40
/ 100
Emerging

This dataset provides pairs of sentences across multiple languages (like English, German, Spanish, etc.) along with a score indicating how semantically similar they are. It helps you train systems to understand if two sentences mean the same thing, even if they are worded differently or are in different languages. Anyone developing or evaluating multilingual natural language understanding models, especially for tasks like semantic search or question answering, would use this.

No commits in the last 6 months.

Use this if you need diverse, scored sentence pairs in various languages to train or benchmark models that measure sentence similarity.

Not ideal if you need datasets for tasks other than sentence similarity, or if you require perfect grammatical accuracy in all translated non-English datasets.

natural-language-processing multilingual-AI semantic-similarity machine-learning-training-data language-AI-evaluation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

33

Forks

9

Language

Python

License

Last pushed

Dec 21, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/PhilipMay/stsb-multi-mt"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.