Giuseppe-Della-Corte/IESTAC
A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models
This provides a large collection of English audio excerpts, their exact English transcripts, and human-quality Italian translations. It helps create systems that can directly translate spoken English into Italian text. Researchers and developers working on speech-to-text translation for English and Italian will find this valuable.
No commits in the last 6 months.
Use this if you need a pre-aligned, extensive dataset of English speech and text paired with Italian text for building or evaluating speech-to-text translation models.
Not ideal if you need a corpus for other language pairs, or if your primary focus is on text-to-text translation rather than speech-to-text.
Stars
11
Forks
—
Language
—
License
—
Category
Last pushed
Jan 26, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Giuseppe-Della-Corte/IESTAC"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos