Giuseppe-Della-Corte/IESTAC

A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models

/ 100

Experimental

This provides a large collection of English audio excerpts, their exact English transcripts, and human-quality Italian translations. It helps create systems that can directly translate spoken English into Italian text. Researchers and developers working on speech-to-text translation for English and Italian will find this valuable.

No commits in the last 6 months.

Use this if you need a pre-aligned, extensive dataset of English speech and text paired with Italian text for building or evaluating speech-to-text translation models.

Not ideal if you need a corpus for other language pairs, or if your primary focus is on text-to-text translation rather than speech-to-text.

speech-translation natural-language-processing language-model-training audiobook-processing machine-translation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights