turinaf/Sagalee

Automatic Speech Recognition Dataset for Oromo Language

/ 100

Emerging

This project provides a comprehensive dataset of Oromo speech and corresponding text, specifically designed for training Automatic Speech Recognition (ASR) models. It allows researchers and developers to input Oromo audio files and generate accurate text transcripts. This dataset is for natural language processing researchers, computational linguists, and AI developers focusing on under-resourced languages.

Use this if you are building, training, or fine-tuning speech-to-text models for the Oromo language.

Not ideal if you need a pre-trained, ready-to-use Oromo ASR model without custom development, or if your focus is on a different language.

Oromo language processing speech recognition development natural language processing computational linguistics AI model training

No License No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights