xinjli/ucla-phonetic-corpus

Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION

33
/ 100
Emerging

This is a meticulously cleaned collection of multilingual phonetic data designed to improve speech recognition systems, especially for languages with limited existing resources. It provides segmented audio files, precise phonetic annotations, and normalized transcriptions for 97 different languages. Researchers and developers working on building or enhancing speech recognition models would find this dataset invaluable.

No commits in the last 6 months.

Use this if you are a speech scientist or machine learning engineer looking for a well-structured and cleaned dataset of phonetic information and audio for low-resource languages.

Not ideal if you need a dataset for commercial purposes, as its license restricts usage to non-commercial applications.

speech-recognition phonetics linguistics-research language-technology machine-learning-datasets
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

46

Forks

4

Language

Python

License

Last pushed

May 12, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/xinjli/ucla-phonetic-corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.