Ijwi-ry-Ikirundi-AI/Kirundi_Dataset

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.

40
/ 100
Emerging

This project creates the first comprehensive, open-source collection of Kirundi speech and text data. It helps preserve and digitize the language for millions of speakers by providing transcribed audio and translated text. Anyone who speaks Kirundi and wants to contribute to building AI tools like voice assistants or translation apps for their language would use this.

Use this if you are a Kirundi speaker or linguist wanting to contribute Kirundi sentences, translations, or audio recordings to build modern language AI.

Not ideal if you are looking for a pre-built Kirundi AI model or an application ready for end-user use, as this project focuses on data collection for model development.

Kirundi-language-preservation speech-recognition text-translation language-digitization low-resource-languages
No Package No Dependents
Maintenance 10 / 25
Adoption 4 / 25
Maturity 13 / 25
Community 13 / 25

How are scores calculated?

Stars

7

Forks

2

Language

Jupyter Notebook

License

Last pushed

Feb 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Ijwi-ry-Ikirundi-AI/Kirundi_Dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.