unza-speech-lab/zambezi-voice
Repository for multilingual speech data resources for native languages of Zambia.
This project provides organized collections of speech recordings for several native languages of Zambia, including Bemba, Nyanja, and Tonga. It offers both labeled datasets (speech with corresponding text) and unlabelled audio from sources like radio broadcasts. Researchers and developers working on language technology for under-resourced Zambian languages can use these resources to build and evaluate systems like speech recognition and machine translation.
No commits in the last 6 months.
Use this if you are a researcher or developer focused on creating, improving, or benchmarking speech and language technologies for Zambian native languages.
Not ideal if you are looking for general-purpose speech data or resources for languages outside of Zambia.
Stars
20
Forks
9
Language
—
License
MIT
Category
Last pushed
Oct 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/unza-speech-lab/zambezi-voice"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos