CouncilDataProject/speakerbox
Speakerbox: Fine-tune Audio Transformers for speaker identification.
This project helps anyone working with audio recordings that contain multiple speakers to automatically identify who is speaking and when. You provide raw audio files, and after a semi-automated annotation process, the system outputs segments of audio labeled with the speaker's identity. This is ideal for researchers, journalists, or anyone needing to analyze conversations in spoken media.
No commits in the last 6 months. Available on PyPI.
Use this if you have audio recordings with multiple known speakers and need a way to automatically label who said what and when.
Not ideal if you have recordings with a large number of unknown speakers, as it requires a dataset of known speakers for training.
Stars
60
Forks
6
Language
Python
License
MIT
Category
Last pushed
Dec 01, 2024
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/CouncilDataProject/speakerbox"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
CVxTz/music_genre_classification
music genre classification : LSTM vs Transformer
HHousen/speaker-change-detection
Speaker change detection using SincNet and an LSTM/Transformer
palonso/MAEST
Pre-training, fine-tuning, and inference code with the MAEST models for music analysis applications.
icon-lab/HST
Official implementation of Hierarchical Spectrogram Transformers (HST)
aaronstevenwhite/spectrans
Modular spectral transformer implementations in PyTorch with Fourier, wavelet, and other...