CouncilDataProject/speakerbox

Speakerbox: Fine-tune Audio Transformers for speaker identification.

/ 100

Emerging

This project helps anyone working with audio recordings that contain multiple speakers to automatically identify who is speaking and when. You provide raw audio files, and after a semi-automated annotation process, the system outputs segments of audio labeled with the speaker's identity. This is ideal for researchers, journalists, or anyone needing to analyze conversations in spoken media.

No commits in the last 6 months. Available on PyPI.

Use this if you have audio recordings with multiple known speakers and need a way to automatically label who said what and when.

Not ideal if you have recordings with a large number of unknown speakers, as it requires a dataset of known speakers for training.

audio-transcription speech-analysis media-analysis conversation-logging meeting-minutes

Stale 6m

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related models

CVxTz/music_genre_classification

music genre classification : LSTM vs Transformer

HHousen/speaker-change-detection

Speaker change detection using SincNet and an LSTM/Transformer

palonso/MAEST

Pre-training, fine-tuning, and inference code with the MAEST models for music analysis applications.

icon-lab/HST

Official implementation of Hierarchical Spectrogram Transformers (HST)

aaronstevenwhite/spectrans

Modular spectral transformer implementations in PyTorch with Fourier, wavelet, and other...

Explore Transformer Models

All categories Trending Transformer directory Insights