shashikg/X-Vector-Based-Speaker-Diarization

Course project for EE698R (2020-21 Sem 2). An X-Vector Based Speaker Diarization System with AutoEncoder based clustering method. Also supports spectral and KMeans clustering method.

22
/ 100
Experimental

This project helps you automatically identify who is speaking and when in audio and video recordings. It takes an audio or video file as input and outputs a timeline (or 'diarization') indicating which speaker is active at different points in time. Anyone who needs to analyze conversations, meetings, or interviews to understand speaker turns would find this useful.

No commits in the last 6 months.

Use this if you need to accurately separate and label different speakers in an audio or video file, especially for improving transcription or analysis.

Not ideal if you already know the exact number of speakers in advance or if you only need to detect speech presence without identifying individual speakers.

audio-analysis meeting-transcription conversation-analysis speech-recognition-prep
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

16

Forks

Language

Jupyter Notebook

License

GPL-3.0

Last pushed

Jun 02, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/shashikg/X-Vector-Based-Speaker-Diarization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.