vectominist/spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

/ 100

Emerging

This project helps speech researchers and machine learning engineers improve how AI models understand spoken content. It takes existing pre-trained speech models and fine-tunes them using a unique clustering approach. The output is a more robust model that can better separate the actual spoken words from different speaker voices, enhancing performance in tasks like speech recognition and discovering acoustic patterns. This is ideal for those working on advanced speech AI.

No commits in the last 6 months.

Use this if you need to improve the content understanding capabilities of your pre-trained speech models by making them more robust to speaker variations.

Not ideal if you are looking for a ready-to-use speech recognition application, as this is a research tool for model improvement.

speech-recognition acoustic-modeling AI-model-fine-tuning speech-processing speaker-diarization

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

xinjli/allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

srvk/eesen

The official repository of the Eesen project

Explore Voice AI Tools

All categories Trending Voice AI directory Insights