carlfm01/my-speech-datasets

My public domain speech index

13
/ 100
Experimental

This project provides pre-processed collections of spoken audio and corresponding text transcripts, specifically for the Spanish language. It helps researchers and developers who are building or improving speech recognition systems by offering ready-to-use, public domain datasets. You get audio files and their accurate text versions, ideal for training machine learning models.

No commits in the last 6 months.

Use this if you need high-quality, aligned Spanish speech and text data to train or evaluate your automatic speech recognition (ASR) models.

Not ideal if you require speech datasets in languages other than Spanish, or if you need data for tasks like speaker identification rather than speech-to-text transcription.

speech-recognition natural-language-processing machine-learning-training-data linguistics audio-transcription
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

License

Last pushed

Sep 19, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/carlfm01/my-speech-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.