apluka34/Bud500

Bud500: A Comprehensive Vietnamese ASR Dataset

40
/ 100
Emerging

Bud500 is a large collection of Vietnamese spoken audio recordings, totaling approximately 500 hours, along with their corresponding written transcriptions. It covers a wide range of topics and includes various Vietnamese regional accents. This resource is for developers and researchers working on building or improving systems that automatically convert spoken Vietnamese into text.

No commits in the last 6 months.

Use this if you are a speech recognition researcher or developer creating or enhancing automatic speech recognition (ASR) models for the Vietnamese language.

Not ideal if you are looking for a tool or application for immediate use; this is a dataset to train other applications, not an end-user product.

Vietnamese-speech ASR-development language-technology speech-to-text natural-language-processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

69

Forks

9

Language

License

Apache-2.0

Last pushed

Oct 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/apluka34/Bud500"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.