Niger-Volta-LTI/yoruba-text

Yorùbá language training text for NLP, ASR and TTS tasks

/ 100

Emerging

This project provides a collection of Yorùbá text that has been carefully prepared for use in language technology applications. It takes raw text from various online sources and converts it into a standardized, fully diacritized format. This resource is for researchers and developers building tools for the Yorùbá language, such as speech recognition or machine translation systems.

No commits in the last 6 months.

Use this if you need clean, consistently formatted Yorùbá text to train or test natural language processing models, speech synthesis, or automatic speech recognition systems.

Not ideal if you are looking for a ready-to-use application or translation service, as this provides foundational data rather than a complete solution.

Yorùbá language NLP research speech technology language data digital humanities

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights