wq2012/SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.

55
/ 100
Established

This tool helps you group audio segments by speaker, a process known as speaker diarization. It takes in numerical representations of sound (audio embeddings) and outputs labels indicating which speaker is talking at different times. An audio engineer, researcher, or anyone working with multi-speaker audio recordings would use this to identify and separate individual voices.

546 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to determine "who spoke when" in an audio recording, given pre-computed numerical embeddings of the audio.

Not ideal if you need a complete, production-ready speaker diarization system that handles audio input directly, as this tool focuses specifically on the clustering step.

speaker-diarization audio-analysis speech-processing voice-biometrics
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

546

Forks

73

Language

Python

License

Apache-2.0

Last pushed

Sep 25, 2024

Commits (30d)

0

Dependencies

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/wq2012/SpectralCluster"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.