apluka34/Bud500

Bud500: A Comprehensive Vietnamese ASR Dataset

/ 100

Emerging

Bud500 is a large collection of Vietnamese spoken audio recordings, totaling approximately 500 hours, along with their corresponding written transcriptions. It covers a wide range of topics and includes various Vietnamese regional accents. This resource is for developers and researchers working on building or improving systems that automatically convert spoken Vietnamese into text.

No commits in the last 6 months.

Use this if you are a speech recognition researcher or developer creating or enhancing automatic speech recognition (ASR) models for the Vietnamese language.

Not ideal if you are looking for a tool or application for immediate use; this is a dataset to train other applications, not an end-user product.

Vietnamese-speech ASR-development language-technology speech-to-text natural-language-processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

—

License

Apache-2.0

Higher-rated alternatives

qianchang/zici

字词：收集国学/汉语字词拼音相关资源

gheyret/UQSpeechDataset

Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット

speechio/BigCiDian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

harisbinzia/PronouncUR

PronouncUR: An Urdu Pronunciation Lexicon Generator

jonsafari/buckeye_dict

Buckeye Pronunciation Dictionary

Explore Voice AI Tools

All categories Trending Voice AI directory Insights