wq2012/VB_diarization
VB Diarization with Eigenvoice and HMM Priors, refactored
This tool helps researchers and analysts automatically identify 'who spoke when' in audio recordings. It takes raw audio as input and outputs a timeline indicating speech segments attributed to different speakers. Anyone working with audio data, such as speech scientists, linguists, or media analysts, who needs to segment conversations by speaker would find this useful.
No commits in the last 6 months.
Use this if you need to automatically distinguish between multiple speakers in an audio recording, assigning each speech segment to the correct person.
Not ideal if you need to identify *who* the specific speakers are (e.g., 'John' vs 'Jane'), as it only differentiates between 'Speaker 1,' 'Speaker 2,' etc.
Stars
15
Forks
3
Language
Python
License
—
Category
Last pushed
Jul 27, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/wq2012/VB_diarization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with...
k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
Picovoice/cheetah
On-device streaming speech-to-text engine powered by deep learning
yeyupiaoling/YeAudio
Python的音频工具
zaigie/FunSpeech
开箱即用的本地私有化部署语音服务,快速搭建FunASR与CosyVoice2/3后端