PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset.
This dataset provides pairs of sentences across multiple languages (like English, German, Spanish, etc.) along with a score indicating how semantically similar they are. It helps you train systems to understand if two sentences mean the same thing, even if they are worded differently or are in different languages. Anyone developing or evaluating multilingual natural language understanding models, especially for tasks like semantic search or question answering, would use this.
No commits in the last 6 months.
Use this if you need diverse, scored sentence pairs in various languages to train or benchmark models that measure sentence similarity.
Not ideal if you need datasets for tasks other than sentence similarity, or if you require perfect grammatical accuracy in all translated non-English datasets.
Stars
33
Forks
9
Language
Python
License
—
Category
Last pushed
Dec 21, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/PhilipMay/stsb-multi-mt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English