czyzi0/the-mc-speech-dataset

Free speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish

21
/ 100
Experimental

This dataset provides over 22 hours of spoken Polish, with each of the 24,018 short audio clips accompanied by its exact text transcription. It's designed for researchers and developers working on speech technologies who need high-quality, single-speaker Polish voice data.

No commits in the last 6 months.

Use this if you are developing or evaluating text-to-speech synthesis, speech recognition models, or other voice applications specifically for the Polish language.

Not ideal if you need a dataset with multiple speakers, different speaking styles, or conversational Polish.

speech-synthesis speech-recognition voice-technology Polish-language AI-training-data
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

License

CC0-1.0

Last pushed

Dec 29, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/czyzi0/the-mc-speech-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.