czyzi0/the-mc-speech-dataset
Free speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish
This dataset provides over 22 hours of spoken Polish, with each of the 24,018 short audio clips accompanied by its exact text transcription. It's designed for researchers and developers working on speech technologies who need high-quality, single-speaker Polish voice data.
No commits in the last 6 months.
Use this if you are developing or evaluating text-to-speech synthesis, speech recognition models, or other voice applications specifically for the Polish language.
Not ideal if you need a dataset with multiple speakers, different speaking styles, or conversational Polish.
Stars
9
Forks
—
Language
—
License
CC0-1.0
Category
Last pushed
Dec 29, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/czyzi0/the-mc-speech-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos