dcavar/ELAN2split

Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners

35
/ 100
Emerging

This tool helps researchers and linguists prepare audio and transcript data for speech processing. It takes ELAN annotation files and their corresponding audio recordings, then segments them into many smaller audio clips and text files. It's designed for anyone building speech corpora or training speech recognition and forced alignment systems.

No commits in the last 6 months.

Use this if you need to convert your ELAN-annotated audio and transcriptions into a corpus format suitable for training Automatic Speech Recognition (ASR) or forced aligner models.

Not ideal if you need a graphical user interface or are not comfortable with command-line tools.

linguistics speech-research corpus-creation audio-transcription computational-linguistics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

11

Forks

3

Language

C++

License

Apache-2.0

Last pushed

Oct 15, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/dcavar/ELAN2split"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.