shashikg/X-Vector-Based-Speaker-Diarization
Course project for EE698R (2020-21 Sem 2). An X-Vector Based Speaker Diarization System with AutoEncoder based clustering method. Also supports spectral and KMeans clustering method.
This project helps you automatically identify who is speaking and when in audio and video recordings. It takes an audio or video file as input and outputs a timeline (or 'diarization') indicating which speaker is active at different points in time. Anyone who needs to analyze conversations, meetings, or interviews to understand speaker turns would find this useful.
No commits in the last 6 months.
Use this if you need to accurately separate and label different speakers in an audio or video file, especially for improving transcription or analysis.
Not ideal if you already know the exact number of speakers in advance or if you only need to detect speech presence without identifying individual speakers.
Stars
16
Forks
—
Language
Jupyter Notebook
License
GPL-3.0
Category
Last pushed
Jun 02, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/shashikg/X-Vector-Based-Speaker-Diarization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
felixbur/nkululeko
Machine learning speaker characteristics
claritychallenge/clarity
Clarity Challenge toolkit - software for building Clarity Challenge systems
juanmc2005/diart
A python package to build AI-powered real-time audio applications
astorfi/3D-convolutional-speaker-recognition
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
wq2012/awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.