dcavar/ELAN2split

Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners

/ 100

Emerging

This tool helps researchers and linguists prepare audio and transcript data for speech processing. It takes ELAN annotation files and their corresponding audio recordings, then segments them into many smaller audio clips and text files. It's designed for anyone building speech corpora or training speech recognition and forced alignment systems.

No commits in the last 6 months.

Use this if you need to convert your ELAN-annotated audio and transcriptions into a corpus format suitable for training Automatic Speech Recognition (ASR) or forced aligner models.

Not ideal if you need a graphical user interface or are not comfortable with command-line tools.

linguistics speech-research corpus-creation audio-transcription computational-linguistics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

C++

License

Apache-2.0

Featured in

Things AI Won't Tell You About Building a Voice App Choosing a Voice AI Library in 2026: What's Actually Worth Building On

Higher-rated alternatives

Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

cmusphinx/pocketsphinx

A small speech recognizer

tensorflow/lingvo

Lingvo

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models,...

PyThaiNLP/pythaiasr

Python Thai Automatic Speech Recognition

Explore Voice AI Tools

All categories Trending Voice AI directory Insights