apluka34/Bud500
Bud500: A Comprehensive Vietnamese ASR Dataset
Bud500 is a large collection of Vietnamese spoken audio recordings, totaling approximately 500 hours, along with their corresponding written transcriptions. It covers a wide range of topics and includes various Vietnamese regional accents. This resource is for developers and researchers working on building or improving systems that automatically convert spoken Vietnamese into text.
No commits in the last 6 months.
Use this if you are a speech recognition researcher or developer creating or enhancing automatic speech recognition (ASR) models for the Vietnamese language.
Not ideal if you are looking for a tool or application for immediate use; this is a dataset to train other applications, not an end-user product.
Stars
69
Forks
9
Language
—
License
Apache-2.0
Category
Last pushed
Oct 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/apluka34/Bud500"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
qianchang/zici
字词:收集国学/汉语字词拼音相关资源
gheyret/UQSpeechDataset
Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット
speechio/BigCiDian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
harisbinzia/PronouncUR
PronouncUR: An Urdu Pronunciation Lexicon Generator
jonsafari/buckeye_dict
Buckeye Pronunciation Dictionary