czyzi0/the-mc-speech-dataset

Free speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish

/ 100

Experimental

This dataset provides over 22 hours of spoken Polish, with each of the 24,018 short audio clips accompanied by its exact text transcription. It's designed for researchers and developers working on speech technologies who need high-quality, single-speaker Polish voice data.

No commits in the last 6 months.

Use this if you are developing or evaluating text-to-speech synthesis, speech recognition models, or other voice applications specifically for the Polish language.

Not ideal if you need a dataset with multiple speakers, different speaking styles, or conversational Polish.

speech-synthesis speech-recognition voice-technology Polish-language AI-training-data

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

CC0-1.0

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights