vectominist/spin
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
This project helps speech researchers and machine learning engineers improve how AI models understand spoken content. It takes existing pre-trained speech models and fine-tunes them using a unique clustering approach. The output is a more robust model that can better separate the actual spoken words from different speaker voices, enhancing performance in tasks like speech recognition and discovering acoustic patterns. This is ideal for those working on advanced speech AI.
No commits in the last 6 months.
Use this if you need to improve the content understanding capabilities of your pre-trained speech models by making them more robust to speaker variations.
Not ideal if you are looking for a ready-to-use speech recognition application, as this is a research tool for model improvement.
Stars
64
Forks
6
Language
Python
License
MIT
Category
Last pushed
May 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/vectominist/spin"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TensorSpeech/TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....
dangvansam/viet-asr
VietASR - Vietnamese Automatic Speech Recognition
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
xinjli/allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
srvk/eesen
The official repository of the Eesen project