vectominist/spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

35
/ 100
Emerging

This project helps speech researchers and machine learning engineers improve how AI models understand spoken content. It takes existing pre-trained speech models and fine-tunes them using a unique clustering approach. The output is a more robust model that can better separate the actual spoken words from different speaker voices, enhancing performance in tasks like speech recognition and discovering acoustic patterns. This is ideal for those working on advanced speech AI.

No commits in the last 6 months.

Use this if you need to improve the content understanding capabilities of your pre-trained speech models by making them more robust to speaker variations.

Not ideal if you are looking for a ready-to-use speech recognition application, as this is a research tool for model improvement.

speech-recognition acoustic-modeling AI-model-fine-tuning speech-processing speaker-diarization
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

64

Forks

6

Language

Python

License

MIT

Last pushed

May 19, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/vectominist/spin"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.