antouanbg/Bulgarian_Linguistic
Collection and resources for Bulgarian Corpus, Datasets and Models used in ASR, TTS or NLP tasks together with the links of corresponding tools/apps.
This project offers a collection of Bulgarian datasets and pre-trained models for tasks like speech recognition, text-to-speech, and natural language processing. It takes raw Bulgarian audio or text and provides structured data and linguistic models. This is ideal for linguists, researchers, or developers working on Bulgarian language technology applications.
No commits in the last 6 months.
Use this if you need readily available Bulgarian linguistic resources to build applications that understand or generate Bulgarian speech and text.
Not ideal if you are looking for a complete, out-of-the-box application rather than foundational data and models for Bulgarian language tasks.
Stars
25
Forks
2
Language
Java
License
—
Category
Last pushed
Jun 06, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/antouanbg/Bulgarian_Linguistic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos