dcavar/ELAN2split
Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners
This tool helps researchers and linguists prepare audio and transcript data for speech processing. It takes ELAN annotation files and their corresponding audio recordings, then segments them into many smaller audio clips and text files. It's designed for anyone building speech corpora or training speech recognition and forced alignment systems.
No commits in the last 6 months.
Use this if you need to convert your ELAN-annotated audio and transcriptions into a corpus format suitable for training Automatic Speech Recognition (ASR) or forced aligner models.
Not ideal if you need a graphical user interface or are not comfortable with command-line tools.
Stars
11
Forks
3
Language
C++
License
Apache-2.0
Category
Last pushed
Oct 15, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/dcavar/ELAN2split"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
cmusphinx/pocketsphinx
A small speech recognizer
tensorflow/lingvo
Lingvo
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models,...
PyThaiNLP/pythaiasr
Python Thai Automatic Speech Recognition